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Foreword 


Do you have a good idea but despair of testing it because of 
the mountain of bureaucratic obstacles in your way and 
the impenetrable trial terminology? Then this book is for 
you. It has been written by well-known experts and will 
act as a roadmap to help you reach your goal. Once you 
have studied it, you will have a better idea of what your 
journey entails—with all its milestones—and you will feel 
that your destination is within reach. 

Reading this book will be a pleasure in itself. As with all 
map reading, the most important points are where you are 


now—your idea—and where you want to go—demon- 
strated clinical benefit. Today, good research certainly re- 
quires navigation along this route. 

The creative curiosity that brought you to the starting 
point is the heart and soul of science, and your ambition 
to reach the goal is the essence of medicine: to help sick 
people. We wish you a good journey! 


Per Aspenberg 
Anders Rydholm 
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Preface 


Research is vital to the advancement of medicine. Although 
evidence-based medicine has promoted better research 
practices and improved translation of evidence to clinical 
practice, the same has not been true of research in surgery. 
Several challenges have led to highly variable quality in 
surgical research and thus limited application to daily sur- 
gical practice. Lack of trained surgeon scientists has been 
one key barrier. Surgical research in most academic set- 
tings is commonly conducted in an extremely difficult en- 
vironment of competing priorities: a busy surgical practice 
versus pressures by university departments to increase 
academic output. The consequence of limited study de- 
signs, incomplete resources to complete studies, and in- 
ability to dedicate time to publish research findings is 
therefore not entirely unexpected. 

In an era of evidence-based medicine (EBM), the most 
sophisticated practice of EBM requires, in turn, a clear de- 
lineation of relevant clinical questions, a thorough search 
of the literature relating to the questions, a critical apprai- 
sal of available evidence and its applicability to the clinical 
situation, and a balanced application of the conclusions to 
the clinical problem. The balanced application of the evi- 
dence (i.e., the clinical decision making) is the central point 
of practicing evidence-based medicine and involves, ac 


cording to EBM principles, integration of our clinical exper- 
tise with patient preferences and values, and with the best 
available research evidence. 

To optimize the translation of evidence into practice, we 
require high-quality research. This textbook is the culmi- 
nation of years of experience conducting research and pro- 
moting evidence-based practice. We provide rationale for 
why we need clinical research, provide principles for clin- 
ical research, and guide readers to practical aspects in the 
practice of research. Throughout the text, we have main- 
tained a standardized approach that includes a number 
of innovative strategies to make the book readable. Jargon 
Simplified sections provide quick, on-the-spot definitions 
of terminology that may not be familiar to most readers. 
Key Concepts sections highlight the most important issues 
covered within the chapter and within the field of clinical 
research. Examples from the Literature provide real exam- 
ples of research to provide practical context to chapters. 
Our text provides the key principles of research design 
and execution and is unique in its approach of ”practical 
advice for working clinical researchers.” 


Mohit Bhandari 
Anders Joensson 
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Historical Perspectives of Clinical Research 


“The process of scientific discovery is, in effect, a continual flight from wonder.” 


Summary 


A brief history of clinical research, from the first human ex- 
periments to today’s patient-centered, randomized con- 


Introduction 


Clinical research has evolved over the centuries from basic 
controlled experiments to the sophisticated randomized, 
double-blind controlled clinical trials overseen by national 
safety and data monitoring boards. Although historically 
the adage, “the patient comes first” can be questioned, in 


Controlled Experiments 


Slotki describes a nutritional experiment involving a con- 
trol group in the Book of Daniel from the Old Testament.' 
The passage from Daniel describes not only a control 
group, but also a concurrent control group, a fundamental 
element of clinical research that gained wider acceptance 
only in the latter half of the 20th century. Although it 
may not be possible to confirm the accuracy of the account 
from the Old Testament, it is clear that the ideas existed 
around 150 BC when this passage was written. There ap- 
pear to be no other recorded examples of thinking in com- 
parative terms about the outcomes of medical treatment in 
ancient or medieval times. Lilienfeld provides an example 
from a fourteenth century the letter from Petrarch to 
Boccaceto, also cited by Witkosky.?" 

Petrarch believed that if a hundred or a thousand men of 
the same age, temperament, and habits, and in the same 
surroundings, were attacked at the same time by the 
same disease, and if one half followed the prescriptions 
of doctors and the other half took no medicine but relied 
on Nature’s instincts, those treated by doctors were more 
likely to survive. 

Packard’ describes the experiment of the renowned 
French surgeon, Ambroise Pare, who during a battle in 
1537, applied the then recognized standard treatment 
for gunshot wounds and poured boiled oil over them. After 
running out of oil, he used a mixture of egg yolks, oil of 
roses, and turpentine. Upon visiting his patients the next 
day, he found that those who had received the new medi- 
cation had little pain, their wounds had neither swollen 


— Albert Einstein 


trolled clinical trials, is presented. 


this first decade of the 21st century there can be no doubt. 
International and national laws and standards have been 
put in place to guarantee the health, safety, and welfare 
of the patient taking part in clinical research. A brief over- 
view of the historical process is presented here. 


nor become inflamed, and the patients had slept through 
the night. The patients who had received boiled oil, how- 
ever, had fever with much pain and swollen wounds. He 
immediately abandoned the boiled oil treatment. 

An early example of a randomized trial comes from the 
Belgian chemist Van Helmont.° Van Helmont suggested 
taking 200 or 500 poor people that have fevers, pleurisies, 
and other ailments out of hospitals, camps, and elsewhere 
and dividing them into halves by casting lots -one half to 
be treated by him and the other half by another. 

In 1747, Lind evaluated 12 patients with scurvy and 
their responses to various interventions. This is perhaps 
the most famous historical example of a planned, con- 
trolled, comparative, clinical trial. He found that those 
who ate oranges and lemons, natural sources of vitamin 
C, showed the most improvement. Similar comparative 
studies were conducted through the 1800s on the effect 
of drugs or vaccines in the treatment for smallpox, 
diphtheria, and cholera. Of note is the work of Pierre- 
Charles-Alexander Louis, a 19th-century clinician and 
pathologist, who introduced the applicability of statistics 
to medical practice.’ He pioneered the idea of comparing 
the results of treatments on groups of patients with similar 
degrees of disease, that is, “like with like.” 

In 1863 Austin Flint (1812-1886) conducted the first- 
ever trial that directly compared the efficacy of a dummy 
simulator (a placebo) with that of an active treatment; 
although Flint’s examination did not compare the two 
against each other in the same trial. Even so, this was a sig- 
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nificant departure from the (then) customary practice 
of contrasting the consequences of an active treatment 
with what Flint described as “the natural history of [an un- 
treated] disease”.® 

The literature of the 1900s primarily consisted of studies 
focused on prophylaxis and the treatment of infectious 
diseases. However, in 1947, the design of clinical trials en- 


Randomized Trials 


Randomized trials date back to at least 1898 when Fibiger 
used systematic assignment to assign diphtheria patients 
to serum treatment or an untreated control group.? Am- 
berson and McMahon in 1931 also used group randomiza- 
tion in a trial of sanocrysin for the treatment of pulmonary 
tuberculosis.!° However, this type of alternating assign- 
ment was discredited shortly thereafter: patients’ knowl- 
edge of their treatment protocol introduced selection 
bias and called into question the treatment results. !! 
Obviously, new methods in how clinical investigators as- 
signed patients to treatment groups were needed to offset 
bias and the subsequent invalidation of study data.'? In the 
1947 MRC study mentioned above, Bradford Hill used ran- 
dom sampling numbers in assigning treatment to subjects 
in the study of streptomycin in pulmonary tuberculosis." 
Under the auspices of the MRC, he continued with further 
randomized trials: chemotherapy of pulmonary tubercu- 
losis in young adults,'* antihistamine drugs in the preven- 
tion and treatment of the common cold,” cortisone and 


Blind Studies 


An important factor in the success of a clinical trial is the 
avoidance of any bias in the comparison of the groups. 
“Blinding” the investigator, patient, and assessor prevents 
an outcome assessment bias. The randomization of pa- 
tients avoids possible bias upon treatment allocation; 
nevertheless, bias can creep in while the study is in pro- 
cess. Both the patients and the doctor may be affected by 
their treatment response and knowledge of treatment gi- 
ven, respectively. For this reason, neither the patient nor 
the person evaluating the patient should know which 
treatment is given. In this case, the trial is called a dou- 
ble-blind study. If only the patient is unaware of the treat- 
ment given, the trial is called a single-blind study. In several 
fields, including surgery, it is often impossible for a study to 


Sequential Trials 


In a sequential trial, each participant’s results are analyzed 
after data become available. If the superiority of a treat- 


tered a new, progressively more scientific phase spear- 
headed by the British Medical Research Council (MRC), 
with the first placebo-controlled randomized clinical trial, 
which evaluated the effect of streptomycin on tuberculo- 
sis. This was the first formal study to randomly assign 
patients to experimental and control groups. 


aspirin in the treatment of early cases of rheumatoid ar- 
thritis,'°'” and long-term anticoagulant therapy in cere- 
brovascular disease.'® These early randomized clinical 
trials helped to transform clinical research into a rigorous 
science with formal methods that minimize investigator 
bias, design flaws, and subjective evaluation of treatment 
data. 

The U.S. National Institutes of Health (NIH) started its 
first randomized trial in 1951, which was a study compar- 
ing adrenocorticotropic hormone (ACTH), cortisone, and 
aspirin in the treatment of rheumatic heart disease.!° 
This was followed in 1954 by a randomized trial of retro- 
lental fibroplasias (now known as retinopathy of prema- 
turity).?° Since World War II, the prospectively randomized 
controlled clinical trial has become the gold standard for 
evaluating new practices and therapeutic agents in medi- 
cine. In their 1976 and 1977 companion articles, Peto 
et al"? provide a thorough description of the design and 
analysis of modern clinical trials. 


be double-blinded; nevertheless, all trials should use the 
maximum degree of blindness that is possible. 

Once again, it was Bradford Hill in the MRC landmark 
trial investigating the effect of streptomycin on tuberculo- 
sis who introduced the concept of blinding. In the study, 
two radiologists and a clinician each read the study pa- 
tients’ radiographs independently and were unaware of 
whether the films were of “C” (control, bed rest alone) or 
“S” (streptomycin and bed rest) cases. Recognizing that if 
blinding and randomization were to be used effectively, 
Bradford Hill tried to ensure that judgments were made 
without any possible bias, without any overcompensation 
for any possible bias, and without any possible accusation of 
bias.” 


ment is established, if there are adverse treatment effects, 
or if it is determined that a treatment difference is unlikely, 
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the study may be terminated. However, sequential trials 
are best used when the outcome of interest is quickly 
known. A sequential trial promotes patient safety and 
treatment benefits, as well as ensuring study cost-effec 
tiveness. In the group sequential trial, the frequency of in- 
terim analyses is usually limited to between three and six. 

Bross and Armitage acknowledged the need for interim 
analysis and the recognition that such analyses affect the 
probability of the type 1 error with the publication in the 


Ethical Principles 
Informed Consent 


In the late 1800s, Charles Francis Withington was an early 
advocate of informed patient consent. Withington ac 
knowledged the possible conflict between the interests 
of medical science and those of the individual patient, 
“and concluded in favour of the latter's indefensible 
rights.” Sir William Osler in the early 1900s insisted in- 
formed consent be a matter of procedure in all medical ex- 
periments.*° Despite this early patient advocacy however, 
informed consent did not become mandatory until the 
mid-20th century. 


Human Welfare 


Human experiments continued to be conducted without 
concern for the welfare of the subjects, often prisoners or 
the disadvantaged. In the 19th century, researchers in Rus- 
sia and Ireland infected people with syphilis and gonor- 
rhea.?! In the United States, physicians put slaves into pit 
ovens to study the effects of heat stroke, and poured scald- 
ing water over them as an experimental cure for typhoid 
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1950s of papers on sequential clinical trials.” The main 
advantage of a sequential trial over a fixed sample size trial 
is that when the length of time needed to reach an end 
point is short, for example, weeks or months, the sample 
size required to detect a substantial benefit from one of 
the treatments is less. In the 1970s and 1980s, group se- 
quential methods and stochastic curtailment came about 
as solutions to interim analyses problems.?®-?8 


fever.” Perhaps one of the most infamous medical trials 
was the Tuskegee Syphilis Experiment begun in 1932 by 
the U.S. Public Health Service. Close to 400 poor Black 
sharecroppers who were afflicted with syphilis were mis- 
led into believing they were being treated when in fact 
their disease was allowed to progress in an attempt to 
study how its natural progression differed from that of 
the White population. In addition to the study being racist 
in nature, it also showed an unsettling indifference for hu- 
man life: not only were the men victims but their wives 
and unborn children also contracted the disease. The study 
continued for 40 years until exposure by the press in 
1972.” For a history of human experimentation in the 
20th century, see Beecher,** Freedman,” and McNeil.?? 

During the Nazi regime from 1933 to 1945, German doc 
tors conducted particularly heinous experiments mainly 
on people of Jewish descent, but also on gypsies, mentally 
disabled persons, Russian prisoners of war, and Polish con- 
centration camp inmates.’ The Nazi doctors were later 
tried for their atrocities at Nuremberg; this led to the crea- 
tion of the Nuremberg Code in 1947, the first international 
effort to establish a set of ethical principles to guide clinical 
research,*° 


Guidelines and Regulations for Clinical Research 


The Nuremberg Code specifies key moral, ethical, and legal 
principles that must be observed in every clinical trial. 
Table 1.1 lists several key sets of international regulations 
and guidelines for clinical research, which provide safe- 
guards for individuals who participate in clinical trials. 

However, it was not until 1955 that the Public Health 
Council in The Netherlands adopted a human experimen- 
tation code.’ In 1964, the World Medical Assembly, essen- 
tially adopting the ethical principles of the Nuremberg 
Code, issued the Declaration of Helsinki,’ which man- 
dated that consent be “a central requirement of ethical re- 
search.”*° This declaration has been updated and amended 
numerous times: Tokyo -1975, Venice - 1983, Hong Kong 
- 1989, Cape Town - 1996, Edinburgh - 2000. 


Under U.S. law, each clinical trial must undergo rigorous 
scientific and ethical review during which the benefit to 
society is ascertained and the risks to patients are consid- 
ered. The Department of Health and Human Services 
(DHHS) and the Food and Drug Administration (FDA) 
have established formal guidelines for the conduct of all 
clinical trials performed in the United States. Anyone re- 
ceiving funding from the DHHS must file a statement of as- 
surance of compliance with the DHHS regulations with the 
NIH. The five general ethical norms required by these reg- 
ulations for developing a clinical trial include a scientifi- 
cally valid research hypothesis and design, competent in- 
vestigators, a favorable risk/benefit ratio, and an equitable 
selection of subjects.” 
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Table 1.1 Overview of Guidelines and Regulations for Clinical Research 











1947 Nuremberg Code Basic moral, ethical, and legal concepts for experimentation; certain basic principles 
must be observed 

1964 Helsinki Declaration Recommendation to guide the doctor in biomedical research involving human sub- 
jects, includes basic principles, medical research combined with professional care, and 
nontherapeutic biomedical research guidelines 

1974 National Commission for the | Developed the DHHS policy for the protection of human research subjects, includes 


Protection of Human Subjects U.S. Department of Health and Human Services criteria for institutional review board 


of Biomedical and Behavioral 
Research 


approval of research studies including requirements for informed consent. This com- 
mission developed the 1979 Belmont Report. 





1979 Belmont Report 


Ethical principles and guidelines for the protection of human subjects for biomedical 


and behavioral research, includes boundaries, basic principles, and applications 


Source: Data from Beauchamp T, Childress J. Principles of Biomedical Ethics. 2nd ed. New York, NY: Oxford University Press; 1983. 


Each participant must be provided with informational 
materials about the study in terms a layperson can com- 
prehend, which include specific details about the scientific 
hypothesis, the risk of injury, the probability of risk, and 
the care provided if injury occurs.*! This requirement 
furthers knowledgeable and voluntary participation by 
patients, confirmed by their written consent. 

An institutional review board (IRB) at each participating 
site must review research protocols for trials conducted at 
multiple institutions. This protocol review process pro- 
vides the investigator, the institution, and the patient 
with assurance that the researchers are medically and ethi- 
cally sound. 

In 1971, the National Cancer Act was enacted; this legis- 
lation was to have an extraordinary impact on clinical 
trials. The National Cancer Institute (NCI) was mandated 
to pursue basic cancer research and to take responsibility 
for the organized application of research results to reduce 
the incidence, morbidity, and mortality of cancer. The 
creation of comprehensive cancer centers in 1973 was in- 
itiated to promote basic laboratory and clinical cancer re- 
search, innovative cancer treatments based on clinical 
trials, as well as the training and education of health care 
professionals in conducting comprehensive clinical trials 


Conclusions 


With standardized procedures, clinical trials have evolved 
to focus on the informed consent, safety, and well-being of 
the patient. The regulation of clinical trials helps to ensure 
that the balance between medical progress and patient 
safety is maintained. 

Nevertheless, in the 21st century, we are facing new 
ethical issues surrounding clinical trials. The World Health 
Organization (WHO) has established the International 
Clinical Trials Registry Platform. Noting, “The registration 
of all interventional trials is a scientific, ethical and moral 


programs. In 1983, the Community Clinical Oncology 
Program was established to enhance and expand the 
NCI’S clinical trials programs.“ This initiative created a 
new mechanism for physicians in community practice to 
participate in clinical cancer research. 


Data Safety and Monitoring in Clinical Trials 


Today, an independent Data Safety and Monitoring Board 
monitors the accumulating data from randomized clinical 
trials for safety and efficacy. The Coronary Drug Project, a 
large multicentered trial sponsored by NIH’s National 
Heart Institute was the first to establish such a committee 
in 1968.“ This committee was formed following con- 
cerns raised by Thomas Chalmers after a presentation of 
interim outcome data by the study coordinators.“ Known 
for promoting the use of meta-analysis in clinical trials and 
for evidence-based critical judgments, Dr. Chalmers con- 
tested the premature release of the study outcome data. 
As a result, it is now the responsibility of a data safety 
and monitoring committee to decide when the accumulat- 
ing data warrants changing the study protocol, terminat- 
ing the study, and releasing results. 


responsibility” (see http://www.who.int/ictrp/en), WHO 
has established an international database of clinical trials 
for the sake of transparency and in the interests of greater 
patient safety. An impetus for the creation of the registry 
stemmed from fears that clinical trials may be carried 
out among the world’s underdeveloped and least-edu- 
cated populations to the benefit of prosperous, developed 
nations. In a sense, the issues surrounding clinical trials to- 
day include not only the rights of an individual patient, but 
also the rights of a global society. 
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Evidence-Based Surgery Defined 


“It is the mark of an educated mind to be able to entertain a thought without accepting 


it.” 


Summary 


In this chapter, evidence-based medicine (EBM) is defined 
and the necessity and challenges of practicing EBM in sur- 


Introduction 


The term “evidence-based medicine” (EBM) first appeared 
in autumn 1990 in a document for applicants to the Inter- 
nal Medicine Residency Program at McMaster University 
that described EBM as an attitude of “enlightened skepti- 
cism” toward the application of diagnostic, therapeutic, 
and prognostic technologies. As outlined in the text Clinical 
Epidemiology! and first described in the literature in the 
ACP Journal Club in 1991,” the EBM approach to practicing 
medicine relies on an awareness of the evidence upon 
which a clinician’s practice is based and the strength of in- 
ference permitted by that evidence. The most sophisticated 
practice of EBM requires, in turn, a clear delineation of re- 
levant clinical questions, a thorough search of the literature 
relating to the questions, a critical appraisal of available 
evidence and its applicability to the clinical situation, 
and a balanced application of the conclusions to the clinical 
problem. The EBM model integrates research evidence, 
clinical circumstances, patients’ values/preferences, and 
clinical experience (Fig. 2.1). 


— Aristotle 


gery are presented. Resources are provided to aid the sur- 
geon in obtaining the best available evidence. 













Clinical 
circumstances 
Clinical expertise 











Research 
evidence 


Patient 
preferences 








Fig. 2.1 Current model of evidence-based medicine. 


Jargon Simplified: Evidence-Based Medicine 
Evidence-based medicine is “the conscientious, explicit, 
and judicious use of current best evidence in making de- 
cisions about the care of individual patients. The prac 
tice of evidence-based medicine requires integration of 
individual clinical expertise and patient preferences 
with the best available external clinical evidence from 
systematic research.” 


How Evidence-Based Medicine Differs from Traditional Approaches to Health Care 


According to the traditional paradigm, clinicians evaluate 
and solve clinical problems by reflecting on their own clin- 
ical experience or the underlying biology and pathophy- 
siology or by consulting a textbook or local expert. For 
many traditional practitioners, reading the Introduction 
and Discussion sections of a research article is sufficient 
for gaining relevant information, and observations from 
day-to-day clinical experience are a valid means of build- 
ing and maintaining knowledge about patient prognosis, 
the value of diagnostic tests, and the efficacy of treatment. 
Because this paradigm places high value on traditional 


scientific authority and adherence to standard ap- 
proaches,’ traditional medical training and common sense 
provide an adequate base for evaluating new tests and 
treatments, and content expertise and clinical experience 
are sufficient to generate guidelines for clinical practice. 
Evidence-based practice posits that although pathophy- 
siology and clinical experience are necessary, they alone 
are insufficient guides for practice. These evidence sources 
may lead to inaccurate predictions about the performance 
of diagnostic tests and the efficacy of treatments. Like the 
traditional approach to health-care, the evidence-based 
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health-care paradigm also assumes that clinical experi- 
ence and the development of clinical instincts (particularly 
with respect to diagnosis) are crucial elements of physician 
competence. However, the EBM approach includes several 
additional steps. These steps include using experience to 
identify important knowledge gaps and information 
needs, formulating answerable questions, identifying po- 
tentially relevant research, assessing the validity of evi- 
dence and results, developing clinical policies that align re- 
search evidence and clinical circumstances, and applying 
research evidence to individual patients with their specific 
experiences, expectations, and values.° 


Key Concepts: The Five As of Evidence-Based Medicine 

1. Ask- Formulate your question. 

2. Acquire - Conduct an efficient search for the best 
available research evidence. 

3. Appraise - Is the evidence you found valid? 

4. Apply - Use the best available evidence and decide 
whether it is applicable to your specific patient ques- 
tion 

5. Act - When evidence is valid, take what you have 
learned back to your patient. 


Unfortunately, practicing EBM is not easy. Practitioners 
must know how to frame a clinical question to facilitate 
use of the literature in its resolution. Typically, a question 
should include the population, the intervention, and rele- 
vant outcome measures. The question, “What is the role of 
internal fixation of tibial fractures?” is vague. The question 
should be “In patients presenting to the emergency room 


The Need for Evidence-Based Medicine 


Over the last several years, the concepts and ideas attribu- 
ted to and labeled collectively as evidence-based medicine 
have become a part of daily clinical lives, and clinicians in- 
creasingly hear about evidence-based guidelines, evi- 
dence-based care paths, and evidence-based questions 
and solutions. The controversy has shifted from whether 
to implement the new concepts to how to do so sensibly 
and efficiently, while avoiding potential problems asso- 
ciated with several misconceptions about what EBM is 
and what it is not. The EBM-related concepts of hierarchy 
of evidence, meta-analyses, confidence intervals, study de- 
sign, and so on, are so widespread, that clinicians to under- 
stand today’s medical literature have no choice but to be- 
come familiar with EBM principles and methodologies. 
The skills associated with EBM should allow clinicians to 
function more rationally. The ability to follow the path 
from research to application should also provide more 
control over what we do, and more satisfaction from our 
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with open tibial diaphyseal fractures (population), what 
is the effect of external fixators versus nonreamed intra- 
medullary nails (interventions) on reoperation rates (out- 
come)?” 

EBM practitioners (i.e., clinicians who work under the 
EBM paradigm) regularly consult original literature, in- 
cluding the Methods and Results sections of research arti- 
cles.° Correctly interpreting literature on prognosis, diag- 
nostic tests, and treatment and potentially harmful expo- 
sures (medications’ side effects, environmental exposures) 
requires an understanding of the hierarchy of evidence. For 
example, in making treatment decisions, EBM practi- 
tioners may conduct an n-of-1 randomized trial (rando- 
mized trial in an individual patient, with the patient re- 
peatedly treated with active intervention or placebo) to 
determine the optimal treatment for an individual pa- 
tient.’ Alternatively, they may seek a systematic review 
of randomized trials of treatment alternatives. If a sys- 
tematic review is not available, they will look for individual 
randomized trials and high-quality observational studies 
of relevant management strategies. If the literature is lack- 
ing altogether, EBM practitioners will fall back on the un- 
derlying biology and pathophysiology, and clinical experi- 
ence. 


Jargon Simplified: n-of-1 Trials 

“n of 1’ trials are conducted by systematically varying 
the management of a patient’s illness during a series of 
treatment periods (alternating between experimental 
and control interventions) to confirm the effectiveness 
of treatment in the individual patient.”® 


daily practice. Although learning to locate, assess, and 
use new evidence in the original literature can improve 
our daily practice, limited access to that information and 
limited time allocated to continuing education may cause 
our up-to-date clinical knowledge to deteriorate with 
time. EBM-related skills provide solutions to deal with 
this problem by allowing us to access, appraise, and apply 
information much more efficiently. 

Critics of EBM have mistakenly suggested that EBM 
equates evidence with results of randomized trials, statis- 
tical significance with clinical relevance, evidence (of 
whatever kind) with decisions, and lack of evidence of effi- 
cacy with the evidence for the lack of efficacy. Other critics 
argue that EBM is not a tool for providing optimal patient 
care, but merely a cost-containment tool.'® All these state- 
ments represent a fundamental mischaracterization of 
EBM. 


10 
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| Why We Need Clinical Research 


Challenges to the Practice of Evidence-Based Medicine 


EBM involves a hierarchy of evidence, from meta-analyses 
of high-quality randomized trials showing definitive re- 
sults directly applicable to an individual patient, to relying 
on physiological rationale or previous experience with a 
small number of similar patients. The hallmark of the evi- 
dence-based practitioner is that, for particular clinical de- 
cisions, he or she knows the strength of the evidence, and 
therefore the degree of uncertainty. 

Evidence-based practitioners must know how to search 
the literature efficiently to obtain the best available evi- 
dence bearing on their question. When they have located 
that evidence, they must be able to evaluate the strength 
of the methods of the studies they find, extract the clinical 
message, apply it back to the patient, and store it for retrie- 
val when faced with similar patients in the future. 

Traditionally, these skills have not been taught in either 
medical school or postgraduate training. Although this si- 
tuation is changing, the biggest influence on how trainees 
will practice is their clinical role models, few of whom are 
currently accomplished EBM practitioners. The situation is 
even more challenging for those looking to acquire the re- 
quisite skills after completing their clinical training. 

Clinicians need to decide the extent to which they will 
become EBM practitioners. Individual practitioners who 
understand and can use EBM skills will be able to more ac- 
curately and assuredly judge competing recommendations 
and alternative courses of action. Further, because the ad- 
vent of EBM has meant that traditional sources of authority 
such as age and experience must be supplemented by ex- 
plicit reference to valid and clinically relevant literature, 
learning EBM skills is essential for clinicians who provide 
recommendations regarding optimal practice through 
authoring reviews, editorials, or practice guidelines or 
through assuming the role of clinical educator. Educators 
without such skills risk missing an important tool for com- 
municating with their students. 

Because becoming an EBM practitioner comes at the cost 
of time, effort, and other priorities, clinicians can also seek 
information from sources that explicitly use EBM tools to 
select and present evidence. However, even this alterna- 
tive requires that clinicians possess the skills necessary to 
apply the available evidence to individual patients. For in- 
stance, to help patients weigh the risks and benefits of a 
treatment, clinicians must understand the best estimate 
of the magnitude of the treatment’s effects as well as the 
precision of that estimate. 

Clinicians who do not want to learn the skills necessary 
to evaluate evidence for themselves may also ensure that 
the patient care they provide is consistent with best evi- 
dence by working in a setting that demands evidence- 
based practice. Such a milieu may include computer-gen- 
erated reminders delivered at the time of patient's visit, 
practice audits accompanied by feedback, and financial in- 


centives or disincentives. Although this way of incorporat- 
ing EBM into daily practice may be less challenging than ac- 
tively seeking best evidence to guide care decisions, it may 
also be unattractive to many health-care providers and ex- 
cessively limit the choices available to both clinicians and 
patients. In addition, EBM principles may be open to abuse 
in such settings, as in the case of local evidence-based prac- 
tice guidelines that may value cost containment over the 
provision of optimal patient care. 


Examples from the Literature: Challenges to Practi- 
cing Evidence-Based Medicine in Surgery 

Source: Bhandari M, Montori V, Devereaux PJ, Dosanjh S, 
Sprague S, Guyatt GH. Challenges to the practice of evi- 
dence-based medicine during residents’ surgical train- 
ing: a qualitative study using grounded theory. Acad 
Med 2003;78:1183-1190." 

Abstract 

Purpose: To examine surgical trainees’ barriers to im- 
plementing and adopting evidence-based medicine 
(EBM) in the day-to-day care of surgical patients. 
Method: In 2000, 28 surgical residents from various sub- 
specialties at a hospital affiliated with McMaster Univer- 
sity Faculty of Health Sciences in Ontario, Canada, parti- 
cipated in a focus group (n = 8) and semistructured inter- 
views (n = 20) to explore their perceptions of barriers to 
the practice of EBM during their training. Additional 
themes were explored, such as definitions of EBM and 
potential strategies to implement EBM during training. 
The canons and procedures of the grounded theory ap- 
proach to qualitative research guided the coding and 
content analysis of the data derived from the focus group 
and semistructured interviews. 

Results: Residents identified personal barriers, staff- 
surgeon barriers, and institutional barriers that limited 
their ability to apply EBM in their daily activities. Resi- 
dents perceived their lack of education in EBM, time con- 
straints, lack of priority, and fear of staff disapproval as 
major challenges to practicing EBM. Moreover, the lack 
of ready access to surgical EBM resource materials 
proved to be an important additional factor limiting 
EBM surgical practice. Residents identified several stra- 
tegies to overcome these barriers to EBM, including hir- 
ing staff surgeons with EBM training, offering course- 
work in critical appraisal for all staff, improving interde- 
partmental communication, and providing greater flex- 
ibility for EBM training. 

Conclusions: Surgical residents identified a general lack 
of education, time constraints, lack of priority, and staff 
disapproval as important factors limiting incorporation 
of EBM. Curriculum reform and surgeon education 
may help overcome these barriers. 
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Resources for Surgeons 


The practice of evidence-based medicine means integrat- 
ing individual clinical expertise with the best available ex- 
ternal clinical evidence from systematic research. “The 
Users’ Guide to the Medical Literature” that has appeared 
in the Journal of the American Medical Association and 
the Canadian Journal of Surgery, as well as the installments 
of the “Users’ Guide to the Orthopaedic Literature” in the 
Journal of Bone & Joint Surgery and Acta Orthopaedica pro- 
vide clinicians with the tools to critically appraise the 
methodological quality of individual studies and apply 
the evidence. 

To provide clinicians with easy access to the best avail- 
able evidence, several specialized sources include summa- 


Conclusions 


Although EBM is sometimes perceived as a blinkered ad- 
herence to randomized trials, it more accurately involves 
informed and effective use of all types of evidence, but par- 
ticularly evidence from the medical literature, in patient 
care. With the ever-increasing amount of available infor- 
mation, surgeons must consider a shift in paradigm from 
traditional practice to one that involves question formula- 
tion, validity assessment of available studies, and appro- 
priate application of research evidence to individual pa- 
tients. 
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ries of individual studies, systematic reviews, and evi- 
dence-based clinical guidelines. One such example is the 
Cochrane Database, which is an extensive database of sys- 
tematic reviews on various topics in musculoskeletal dis- 
ease. Additionally, the Cochrane Database contains a Con- 
trolled Clinical Trial Registry, which provides a compre- 
hensive list of randomized clinical trials in orthopaedics 
and other subspecialty areas (both can be accessed at 
http://www.cochrane.org). The Canadian Journal of Sur- 
gery, Journal of Bone and Joint Surgery, Journal of Orthopae- 
dic Trauma, Acta Orthopaedica, and Clinical Orthopaedics 
and Research all provide evidence summaries and study 
methodology tips on a variety of topics. 
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Myths and Misconceptions about Evidence-Based Medicine* 


“E's not pinin'! 'E's passed on! This parrot is no more! He has ceased to be! 'E's expired 
and gone to meet 'is maker! 'E's a stiff! Bereft of life, 'e rests in peace! If you hadn't nailed 
‘im to the perch 'e'd be pushing up the daisies! 'Is metabolic processes are now ‘istory! 'E's 
off the twig! 'E's kicked the bucket, 'e's shuffled off 'is mortal coil, run down the curtain 
and joined the bleedin’ choir invisible! THIS IS AN EX-PARROT!” 


Summary 


In this chapter, the myths and misconceptions associated 
with evidence-based medicine (EBM) are described. The 
seven main criticisms and limitations of EBM are (1) EBM 
is not possible without randomized controlled trials 
(RCTs), (2) EBM disregards clinical proficiency, (3) one 


Introduction 


Based on a lack of knowledge, misconceptions are conveni- 
ent interpretations of the principles of evidence-based 
medicine. These false interpretations are contradiction to 


— Mr. Praline in Monty Python’s Dead Parrot Sketch 


needs to be a statistician to practice EBM, (4) the useful- 
ness of applying EBM to individual patients is limited, (5) 
keeping up-to-date and finding the evidence is impossible 
for busy clinicians, (6) EBM is a cost-reduction strategy, 
and (7) EBM is not evidence-based.!> 


Most of the criticisms have their roots in a misunder- 
standing of the concepts of EBM and are discussed in detail 
below. 


the true goals of evidence-based medicine: improving pa- 
tient care. 


Key Concepts: Common Misconceptions about Evidence-Based Medicine 


Misconceptions Facts 


1. EBM is not possible without RCTs. 


2. EBM disregards clinical 
proficiency. 


EBM posits the use of the best available evidence in clinical decision-making. 


EBM is the integration of individual clinical expertise and patient preferences with the 
best available evidence.’ 


3. One needs to be a statistician to 
practice EBM. 


The EBM cycle helps in clinical decision making and does this by assessing the problem, 
asking relevant questions, acquiring literature, critically appraising the retrieved 
articles, and finally applying the findings to help the patient. Understanding statistical 
test is only a part of this process. 


4. The usefulness of applying EBM 
to individual patients is limited. 


A fundamental principle of EBM tells us that evidence from the orthopaedic literature 
alone can never guide our clinical actions; we always require the inclusion of patients’ 
values or preferences.”° Individual patient preferences may differ from the evidence 
available in the literature. 


5. Keeping up-to-date and finding the Realizing the volume of literature in orthopaedics, several resources have been 
evidence is impossible for busy clinicians. developed and promoted to assist busy clinicians in finding the current best evidence. 


6. EBM is a cost-reduction strategy. EBM is a quality improvement approach; consistently applied, it can reduce costs by 


reducing inappropriate variation. 


7. EBM is not evidence-based. Evidence-based medicine is perceived as helpful in structuring daily clinical decision 


making. 


* Adapted from Poolman RW,, Petrisor BA, Marti RK, Kerkhoffs GM, Zlowodzki M, Bhandari M. Misconceptions about practicing evidence-based orthopedic 
surgery. Acta Orthop 2007;78:2-11. Reprinted with permission. 
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Misconception 1: Evidence-Based Medicine Is Not Possible without Randomized 


Controlled Trials 


A frequently heard comment during discussions among 
orthopaedic surgeons is “There is no evidence.” In fact, 
this statement is a translation from “No RCTs on this topic 
exist.” Typically other study designs are present in the sur- 
gical literature, especially as not all interventions are suita- 
ble for evaluation in an RCT. 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is an “experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.”’ 


Therefore, evidence-based orthopaedic surgery posits the 
use of the best available evidence in clinical decision mak- 
ing. The term “best evidence” assumes that a hierarchy of 
evidence must exist. Sackett and colleagues proposed a 
hierarchy with large randomized trials on the top and opi- 
nion at the bottom (described in Chapter 6).”° Orthopaedic 
surgeons have modified this initial description for use in 
journals such as the Journal of Bone and Joint Surgery 
American Volume and Clinical Orthopaedics and Related 
Research.®° 

What is the best available evidence? Data derived from 
RCTs is considered to be the highest level of evidence 
mainly because randomization is the best way to balance 
known, and the only way to balance unknown, prognostic 
factors within both treatment and control groups in a ther- 
apeutic study.”® Although rated as level I, an RCT can still 
have methodological flaws.'° Surgical trials, for example, 
have several important issues that differentiate them 
from trials of drug therapies. Common questions raised re- 
garding orthopaedic trials include “Were all surgeons 
equally skilled at performing the techniques in the study? 
Were the techniques ‘specialized’ or are they techniques 
general orthopaedic surgeons should be able to perform? 
If technique A is better than B, and a surgeon does techni- 
que B are they now required to learn and do technique A?” 
These questions can threaten how the results of these stu- 
dies are interpreted by surgeons and subsequently how 
they are integrated or not integrated into clinical prac 
tice." 


Key Concepts: Common Questions Raised Regarding 
Orthopaedic Trials 

e Were all surgeons equally skilled at performing the 
techniques in the study? 

Were the techniques “specialized” or are they techni- 
ques general orthopaedic surgeons should be able to 
perform? 

If technique A is better than B, and a surgeon does 
technique B is the surgeon now required to learn 
and do technique A? 


It is also important to recognize that not all clinical ques- 
tions can be answered with an RCT. Although randomiza- 
tion in RCTs can be stratified based on prognostic factors, it 
would be in some cases unethical to actively randomize 
patients to certain types of prognostic or risk factors. For 
example, it would clearly be unacceptable to randomize 
consecutive patients to alcohol or to no alcohol to deter- 
mine if alcohol negatively affects fracture healing. Prog- 
nostic factors of a disease or intervention can be assessed 
with a cohort study design, which then provides the high- 
est level of evidence without being an RCT.!? There are 
other situations where an RCT may not be feasible. For ex- 
ample, when the sample size required is too large or the 
follow-up requires many years. 

In orthopaedic surgery, “lesser” forms of evidence have 
provided many insights that would have been impossible 
with RCTs.” Some investigators argue that a well-designed 
nonrandomized study can effectively provide the same re- 
sults as randomized trials.!^!6 It has been shown in the 
orthopaedic literature, however, that observational stu- 
dies can over- or underestimate treatment effect." There 
are examples in the literature where clinical practice has 
been changed because of a high-quality RCT or meta-ana- 
lysis.'®!° As a final point, observational series or case 
reviews can generate highly significant hypotheses. 
Although not providing definitive answers for clinical prac- 
tice, they can most definitely set the stage for further ex- 
perimental work. A classic example of an observation 
study of case-control design would be the assessment of 
the effect of exposure to asbestos or other potential prog- 
nostic factors on the incidence of a type of lung cancer, 
pleural mesothelioma (rare outcome, needs years to de- 
velop). In this case, one would identify patients with lung 
cancer and a control group without lung cancer and retro- 
spectively assess the patients’ exposure to asbestos and 
other risk factors in both groups.”° 
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Examples from the Literature: An Example of a Cohort 
Study 

Source: Jarvholm B, Englund A, Albin M. Pleural me- 
sothelioma in Sweden: an analysis of the incidence ac 
cording to the use of asbestos. Occup Environ Med 
1999; 56:110-113.”° 

Abstract 

Objective: To investigate if the preventive measures ta- 
ken to reduce the occupational exposure to asbestos 
have resulted in a decreased incidence of pleural me- 
sothelioma in Sweden. 

Methods: The incidence of pleural mesothelioma be- 
tween 1958 and 1995 for birth cohorts born between 
1885 and 1964 was investigated. The cases of pleural 
mesothelioma were identified through the Swedish 
Cancer Register. 

Results: In 1995, around 80 cases of pleural mesothe- 
lioma could be attributed to occupational exposure to 
asbestos. There is an increasing incidence in more recent 
birth cohorts in men. The incidence was considerably 
higher in the male cohort born between 1935 and 
1944 than in men born earlier. 

Conclusions: The annual incidence of pleural mesothe- 
lioma attributable to occupational exposure to asbestos 
is today larger than all fatal occupational accidents in 
Sweden. The first asbestos regulation was adopted in 
1964 and in the mid-1970s imports of raw asbestos de- 
creased drastically. Yet there is no obvious indication 
that the preventive measures have decreased the risk 
of pleural mesothelioma. The long latency indicates 
that the effects of preventive measures in the 1970s 
could first be evaluated around 2005. 


This aids in further research and has provided information 
in a relatively short time with low cost. These points illus- 
trate the continued value of experimental and observa- 
tional design. Unfortunately, there is the reality of publica- 
tion bias, which is the tendency of investigators, reviewers, 
and editors to submit or accept manuscripts for publication 
based on the direction or strength of the study findings.?!-~4 


Jargon Simplified: Publication Bias 

“Publication bias occurs when the publication of re- 
search depends on the direction of the study results 
and whether they are statistically significant.” 


This, at times, makes finding all available evidence diffi- 
cult, which can influence clinical decision making. Pub- 
lished reports describing complications, adverse events, 
technique or hardware failure, and mistakes in concepts 
can prevent repetitive studies of unsuccessful procedures, 
thus protecting patients.”°° 


Examples from the Literature: An Example of a Retro- 
spective Study Describing Complications 

Source: Nilsen AR, Wiig M. Total hip arthroplasty with 
Boneloc: loosening in 102/157 cases after 0.5-3 years. 
Acta Orthop Scand 1996;67:57-59.7> 

Abstract: We report the outcome of 177 consecutive 
primary Charnley total hip arthroplasties inserted with 
Boneloc cement between November 1991 and Novem- 
ber 1993. There were 107 women and 70 men. The 
mean age at the time of the operation was 71 years. 11 
patients (13 hips) died during the follow-up period 
and 3 patients were too weak to attend a follow-up ex- 
amination. Of the 161 remaining hips, 4 had been re- 
vised because of deep infection. The mean follow-up 
time for the remaining 157 hips was 2 (0.5-3) years. 
24 hips had been revised and 6 are waiting for revision 
because of stem loosening. Of the remaining 127 hips, 
72 showed radiographic signs of stem loosening and 2 
hips were probably loose. Osteolysis was seen around 
the femoral component in 56 hips. 


For rare orthopaedic conditions, it will take too long to find 
a sufficient sample size. Here again, case series may be 
helpful as the best available evidence to help in clinical de- 
cision making. 


Misconception 2: Evidence-Based Medicine Disregards Clinical Proficiency 


Surgeons are afraid to lose autonomy in clinical decision 
making with the increasing amount of evidence-based 
guidelines. Following those guidelines has been referred 
to as “cookbook medicine,” which prevents surgeons 
from using their own “recipe.”*?” This myth stands in con- 
tradiction with the definition of EBM, which is the integra- 
tion of individual clinical expertise and patient preferences 
with the best available evidence.” The practice of EBM 
needs to foster within the surgeon an attitude of empow- 
erment, not managing patients according to “the way 
we've always done it,” but based on currently available 
best evidence. This requires an understanding of how to 


find the literature, an appreciation of different study de- 
signs and hierarchies of evidence, and how to incorporate 
study results into practice. 

Furthermore, applying evidence directly is only applic 
able to the patient population that it is derived from. It is 
therefore necessary to understand issues of applicability 
and generalizability of study results to a specific patient 
or patient population. Let us take the example of a patient 
with a displaced intertrochanteric hip fracture who also 
has dementia. It may be that the best available evidence 
ona specific treatment includes those patients without de- 
mentia, or only those with undisplaced fractures. It would 
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then be necessary to extrapolate the results from those 
studies to our patient if possible. Although we may have 
some evidence to suggest which treatment to use in the 
study patients, clinical expertise helps in making decisions 
regarding the generalizability of those results.478 As 
pointed out clearly by Petersen et al,”° results from a RCT 
apply only to the cohort of patients that consented. We 
are often uninformed about the patient characteristics of 
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the “nonconsenters.” The findings of Petersen’s study cau- 
tion the generalizability of even level I studies. Indeed, it 
was Sackett who said “Without clinical expertise, practice 
risks becoming tyrannized by evidence, for even excellent 
external evidence may be inapplicable to or inappropriate 
for an individual patient.” But goes on to say, “Without cur- 
rent best evidence, practice risks becoming rapidly out of 
date, to the detriment of patients.””” 


Misconception 3: One Needs to Be a Statistician to Practice Evidence-Based Medicine 


Being an EBM practitioner does not mean that one has to 
be a statistician. However, understanding basic terminol- 
ogy is an important step toward the effective use of ortho- 
paedic literature. Common terminology requisite to the 
practice of EBM is provided in Table 3.1.7” Moreover, 
EBM focus on study methodology and statistics is only 
one part of optimal study design and conduct. 

Studies with a positive treatment effect and signifi- 
cant P values are often seen in the literature. This statisti- 
cally significant treatment effect at times can be confused 
with a clinically important treatment effect. This may or 
may not be true. For example, a randomized trial of 1000 
patients may report a statistically significant improvement 
in patient functional scores (3-point difference on a 100- 
point scale) following operative versus nonoperative treat- 


ment of calcaneal fractures. However, many would argue 
that this difference is not clinically relevant and may not af- 
fect surgeon practice. Significant P values have been shown 
to influence the perception of surgeons regarding the im- 
portance of an article.*” More important, it may be neces- 
sary to think in terms of the confidence interval (CI). The CI 
overcomes the limitations of the P value by providing infor- 
mation about the size and direction of the effect, and the 
range of values for the treatment effect that remain consis- 
tent with the observed data.” 

Despite the incorrect overemphasis of statistics in EBM, 
Guyatt and Sackett, the forefathers of EBM, remind us that 
practicing evidence-based medicine starts with the pa- 
tient and ends with the patient.”” 


Table 3.1 Common Terminology Requisite to the Practice of Evidence-Based Orthopaedic Surgery 


Term Explanation 


Study power 
ditions if one in fact exists. 


In a comparison of two interventions, the ability to detect a difference between the two experimental con- 











Alpha error The probability of erroneously concluding there is a difference between two treatments when there is no 
difference. Typically, investigators decide on the chance of a false positive result they are willing to accept when 
they plan the sample size for a study. 

Beta error The statistical error (said to be “of the second kind” or type II) made in testing when it is concluded that 
something is negative when it really is positive. Beta error is often referred to as a false-negative. 

P value The probability that results as or more extreme than those observed would occur if the null hypothesis was true 


and the experiments were repeated over and over 





Confidence interval 


Range of two values within which it is probable that the true value lies for the entire population of patients from 
whom the study patients were selected 





Effect size 


The difference in the outcomes between the intervention and control groups divided by some measure of the 
variability, typically the standard deviation 





Number needed to treat 
(NNT) 


The number of patients who need to be treated during a specific period to prevent one bad outcome. When 
discussing number needed to treat it is important to specify the treatment, its duration, and the bad outcome 
being prevented. It is the inverse of the absolute risk reduction. 











Relative risk Ratio of the risk of an event among an exposed population to the risk among the unexposed 
Odds Ratio of probability of occurrence to nonoccurrence of an event 
Odds ratio Ratio of the odds of an event in an exposed group to the odds of the same event in a group that is not exposed 


Source: Data from Bhandari M, Tornetta P Ill, Guyatt GH. Glossary of evidence-based orthopaedic terminology. Clin Orthop Relat Res 


2003;413:158-163. 
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Misconception 4: The Usefulness of Applying Evidence-Based Medicine to Individual 


Patients Is Limited 


A fundamental principle of EBM tells us that evidence from 
the orthopaedic literature alone can never guide our clin- 
ical actions; we always require the inclusion of patients’ 
values or preferences.” Individual patient preferences 
may differ from the evidence available in the literature. 
A relatively new concept is “evidence-based patient 
choice.”?!?? It describes two movements in Western health 
care systems: (1) the increasing demand for evidence- 
based information and (2) the centrality of individual pa- 
tient choices and values in medical decision making?! 
Today, surgeons are not the only ones overloaded with in- 
formation. Patients also have an abundance of information 
from a variety of resources, most commonly the Internet. 
An evidence-based approach to surgery limits patients’ op- 
tions to choose from proven therapies.?” Newer therapeu- 
tic options whose effectiveness is not backed up by evi- 
dence in the literature might therefore not be presented 
to the patient. To help patients in making the right decision 
for them, surgeons must be able both to know and critically 
appraise the literature. As Haynes believes “Evidence does 


not make decisions, people do.”** For example, a recent 
meta-analysis on intracapsular hip fractures shows that 
there is a significant reoperation rate with internal fixation 
compared with arthroplasty (relative risk = 0.23, 95% CI = 
0.13 to 0.42). However, there was a trend (relative risk = 
1.27, 95% CI = 0.84 to 1.92) toward an increase in mortality 
with hemiarthroplasty.* Later, this trend was disputed by 
a newer meta-analysis.” Although this evidence suggests 
arthroplasty would be the preferred choice for treating pa- 
tients with displaced femoral neck fractures because of a 
lowered reoperation rate, patients may favor internal fixa- 
tion devices for personal reasons. For instance, they may 
fear a potentially increased risk of mortality with arthro- 
plasty (a patient important outcome) or have had previous 
personal experiences leading them toward one decision or 
the other. This specific patient may not fit the profile of 
those studied within the meta-analysis. This illustrates 
the importance of patient values, clinical acumen, and 
best evidence.*® 


Misconception 5: Keeping Up-to-Date and Finding the Evidence Is Impossible for Busy 


Surgeons 


Opponents of EBM argue that practicing EBM is not easy.” 
This is true. The number of publications in the orthopaedic 
and surgical literature is growing. Textbooks are still fre- 
quently used by orthopaedic surgeons as reference stan- 
dards, although the presented information often is out- 
dated when the book goes to press.” Realizing the volume 
of literature in orthopaedics, several resources have been 
developed and promoted to assist busy clinicians in finding 
the current best evidence. 

Database sources like the Cochrane Database of Systemic 
Reviews (www.cochrane.org) or “Clinical Queries” in 
PubMed preferentially identify systematic reviews and 
have built in filters to find randomized trials. Clinical 
queries in PubMed uses evaluated search strategies with 
your search to prevent an abundance of irrelevant hits.° 
The standard PubMed search for hip AND fracture* AND ar- 
throplasty* would result in 2289 citations. The clinical 
queries feature limits the search to 66 articles. To keep 
up-to-date one can register to “My NCBI” for free. This 
will facilitate an e-mail service, which sends all new trials 


in your search on a daily, weekly, or monthly basis to your 
e-mail address. These strategies can dramatically reduce 
the time required to identify quality research. The same 
clinical queries feature can be used to find relevant sys- 
tematic reviews and keep up-to-date with them. Recently, 
a postpublication clinical peer-review system was intro- 
duced to help busy clinicians identify relevant and news- 
worthy publications.“ (Please see Chapter 27 for details 
on literature searches.) 

Preappraised resources such as EBM reviews in the Jour- 
nal of Bone and Joint Surgery American Volume, Journal of 
Orthopaedic Trauma, and the Canadian Journal of Surgery 
are just a few options for surgeons. Additional preap- 
praised resources include the ACP Journal Club, Evidence- 
based Medicine, Up to Date, and Bandolier. These resources 
have conducted the searches, summarized the results, and 
provided user-friendly summaries and bottom line conclu- 
sions for orthopaedic surgeons. Table 3.2 features Web 
sites relevant to practicing evidence-based orthopaedic 
surgery. 


www.urdukutabkhanapk.blogspot.com 


3 Myths and Misconceptions about Evidence-Based Medicine 


Table 3.2 Web Sites Relevant to Practicing in Evidence-Based Orthopaedic Surgery 





Name Web Site 
Bandolier http://www.jr2.ox.ac.uk/bandolier/booth/booths/bones.html 
Best Bets www.bestbets.org 





Cochrane Collaboration 


www.cochrane.org 





Centre for Evidence-Based Medicine, Oxford 


http://www.cebm.net 





Evidence-Based On Call 


www.nelh.nhs.uk/eboc.asp 





PubMed Clinical Queries 


http://www.ncbi.nlm.nih.gov/entrez/query/static/clinical.shtml| 





Postpublication clinical peer review system 


http://plus.mcmaster.ca/raters/stellar.asp 





Up to Date 


www.uptodate.com 


Misconception 6: Evidence-Based Medicine Is a Cost-Reduction Strategy 


This misconception is closely related to Misconception 2 in 
that it resembles the surgeons’ fear of losing the mandate 
on their decisions. Insurance companies and hospital man- 
agers may use this misconception to their advantage, “No 
evidence: no reimbursement.” However, the true nature 
of EBM is a quality improvement approach; consistently 
applied, it may indeed reduce costs by reducing inap- 


propriate variation. Again, the central position of the pa- 
tient in the evidence-based cycle must be stressed; a wel- 
come side effect of the systematic approach in decision 
making can be cost reduction. Doctors must be warned 
EBM terminology may be misused by insurance companies 
against the best interests of the patient. 


Misconception 7: Evidence-Based Medicine Is Not Evidence-Based 


Critics of EBM cite the lack of evidence that EBM approaches 
actually improve patient outcomes. Although this is par- 
tially correct, several reports do suggest that EBM education 
instills greater satisfaction and good retention skills among 
trainees.”! Also, practicing evidence-based medicine is per- 
ceived helpful in structuring daily clinical decision mak- 
ing.4!*8 Although it may be difficult to ascertain the effec- 
tiveness of evidence-based medicine as a whole, it is not so 
difficult to see how some of the parts that make up evi- 
dence-based medicine play a role in patient care. 

The effective and timely implementation of key clinical 
research, that is finding and understanding the best avail- 
able evidence, has been shown on more than one occasion 
to affect patient outcomes. The implementation of evi- 
dence from RCTs and meta-analyses has helped to sepa- 
rate, in some cases, successful therapies from ineffective 
or even harmful treatment modalities where previous 
observational studies (or in some cases, studies lacking sig- 
nificant methodological rigor) failed to exemplify this.” 
Results of randomized controlled trials made surgical in- 
terventions such as vagotomy for the treatment of peptic 
ulcer disease, obsolete.“ Results of randomized controlled 
trials have also provided the basis for what are now routine 
interventions, such as antibiotic prophylaxis in the surgi- 
cal treatment of closed fractures.!® 

In the realm of medical education, increasing use and 
understanding of critical appraisal is an essential tool for 


the surgical resident as the amount to learn in an ever- 
shortening time is growing. Focusing on relevant and 
high-quality literature “foreground” questioning can only 
augment background knowledge and sound surgical prin- 
ciples.“ In this instance, aspects of EBM could be viewed as 
epistemology and a resident or trainee can ask the ques- 
tion, “How do we know what we know?” There is also 
some evidence to suggest that incorporation of EBM prin- 
ciples into resident journal clubs and education enhances 
self-assessment abilities as well as the perceived educa- 
tional value of these experiences.°°°* 

These are some examples of how the individual parts of 
practicing EBM - asking answerable questions, finding, ap- 
praising, and applying the evidence make the practice of 
EBM a systematic clinical utility that effects clinical prac 
tice and patient outcomes. Attempts have been made to 
evaluate the influence of EBM on society in general,°® 
and the effect of evidence-based practice on the quality 
of life of patients specifically.°° However, measurement 
of the effects of EBM on patients’ quality of life remains 
complicated; an evaluation of all individual aspects of the 
practice of EBM is needed.°? 

Evidence-based orthopaedic surgery is part of the evolu- 
tion of science; only time will tell if we are on the right track. 
An ongoing evaluation of the practice of evidence-based 
orthopaedic surgery will be necessary to ensure the highest 
possible evidence and to guarantee excellent patient care. 
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Conclusions 


Most criticism of evidence-based orthopaedics are rooted 
in myths and misconceptions. Evidence-based orthopae- 
dics should be perceived as a guide to help in clinical deci- 
sion making in busy clinical practices. It must be reinforced 
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Becoming an Evidence-Based Surgeon 


“The EBM-related concepts of hierarchy of evidence, meta-analyses, confidence intervals, 
and study design are so widespread, that surgeons, researchers, and health care personnel 
willing to use today's surgical literature with understanding have no choice but to become 
familiar with evidence-based medicine principles and methodologies.” 


Summary 


In this chapter, the integration of research evidence into 
orthopaedic practice is discussed and strategies are 


Introduction 


The term evidence-based medicine was coined by Profes- 
sor Gordon Guyatt in 1990 in a document for applicants 
to the Internal Medicine residency program at McMaster 
University in Hamilton, Ontario.’ Professor Guyatt de- 
scribed evidence-based medicine as an attitude of “enligh- 
tened skepticism” toward the application of diagnostic, 
therapeutic, and prognostic technologies.' Evidence-based 
medicine was first described in the literature in the ACP 
Journal Club in 1991, as the evidence-medicine approach 
to practicing medicine relies on an awareness of the evi- 
dence upon which a clinician’s practice is based and the 
strength of the inference permitted by that evidence. 
The current evidence-based medicine (EBM) model in- 


Implementation of Research Evidence 


EBM has fueled the recent increased interest in making 
clinical and policy decisions based on research findings.® 
The decision whether to implement research evidence de- 
pends on the quality of the research, the degree of uncer- 
tainty of the findings, relevance to the clinical setting, 
whether the benefits to the patient outweigh any adverse 
effects, and whether the overall benefits justify the costs 
when competing priorities and available resources are ta- 
ken into account.® 


Key Concepts: Implementation of Research Evidence® 

“The decision whether to implement research evidence 

depends on: 

e The quality of the research 

e The degree of uncertainty of the findings 

e The relevance to the clinical setting 

e Whether the benefits to the patient outweigh any ad- 
verse effects 


— Mohit Bhandari, 2003 


presented to help orthopedic surgeons become evidence- 
based orthopaedic surgeons. 


cludes the integration of clinical expertise with patients’ 
values, clinical circumstances, and the best available re- 
search evidence.34 


Jargon Simplified: Evidence-Based Medicine 
Evidence-based medicine is “the conscientious, explicit, 
and judicious use of current best evidence in making de- 
cisions about the care of individual patients. The prac- 
tice of evidence-based medicine requires integration of 
individual clinical expertise and patient preferences 
with the best available external clinical evidence from 
systematic research.” 


e Whether the overall benefits justify the costs when 
competing priorities and available resources are taken 
into account” 


Implementation of research evidence rarely occurs unless 
there are concerted attempts to get the results into prac 
tice.” These attempts are severely constrained by the lim- 
ited capacity of health care systems to absorb new research 
and the investment necessary to overcome the obstacles of 
getting research into practice.° The anticipated benefits of 
implementation vary according to factors such as the di- 
vergence between research evidence and current practice 
or the pressure of policies that influence the marginal ben- 
efit of further efforts at implementation.® 

The degree to which orthopaedic surgeons see even 
good quality research as able to be implemented will de- 
pend on the extent to which the results conflict with pro- 
fessional experience and beliefs. Depending on the per- 
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ceived risks, the extent of change required, and the quality 
and certainty of the research results, many surgeons, clin- 
icians, and policymakers will wait for confirmatory evi- 
dence.® 

When designing studies, it is necessary for investigators 
to consider how and by whom their results will be used.° 
The research design should be sufficiently robust, the set- 
ting sufficiently similar to that in which the results are 
likely to be implemented, the outcomes should be relevant, 
and the study size large enough for the results to convince 
decision makers of their importance.® 


Key Concepts: Criteria for a Research Design® 

“The design should be: 

e sufficiently robust 

e the setting sufficiently similar to that in which the re- 
sults are likely to be implemented 

the outcomes should be relevant 

and the study size large enough for the results to con- 
vince decision makers of their importance” 


Applying the Results to Individual Patients 


Reading a few orthopaedic journals each month is no 
longer sufficient preparation for answering the questions 
that emerge in everyday practice. There are currently 
over 100 journals indexed by MEDLINE whose main sub- 
ject is orthopaedics, sports medicine, or hand surgery.’ 
There are more than 3800 biomedical medical journals 
currently in PubMed, with more than 7300 citations added 
weekly.’ It is also important to remember that not all stu- 
dies are equally well designed or interpreted and that a 
hierarchy of evidence exists. 


Jargon Simplified: Hierarchy of Research Designs 

The hierarchy of research designs reflects the relative 
weight given to various types of research evidence and 
study designs. The results of a randomized controlled 
trial are considered the highest level of evidence. 


To keep up-to-date with this volume of information re- 
quires an effective system for information triage. Evi- 
dence-based guidelines and systematic reviews, provided 
they are well-conducted and regularly updated, offer reli- 
able overviews based on rigorous identification and ap- 
praisal of the available evidence. These articles come 
from a variety of sources and the most efficient way to 
find them is by searching of electronic databases and/or 
the Internet. With time at a premium, it is important to 
know where to look and how to develop a search strategy, 
or filter, to identify the evidence most efficiently and effec 
tively. How to conduct a comprehensive literature search is 
discussed in Chapter 27. 


4 Becoming an Evidence-Based Surgeon 


Jargon Simplified: Evidence-Based Guidelines 
Evidence-based guidelines are “systematically devel- 
oped statements, which assist in decision making about 
appropriate health care for specific clinical conditions. 
They aim: to improve the diagnosis and treatment of a 
particular condition; to reduce variations in medical 
practice and thereby improve the quality of patient 
care in clinical practice; and to encourage further re- 
search. Evidence-based guidelines are based on good re- 
search evidence of clinical effectiveness. They will form 
the basis for the standards against which comparative 
audit will be conducted.”!° 


A rigorously conducted evidence-based guideline, which 
has systematically identified and critically appraised pub- 
lished systematic reviews and randomized controlled 
trials may provide the best evidence for practice when 
the question is broad.’ Guidelines from many sources can 
be identified through the U.S. National Guidelines Clearing 
House, but this database does not provide a uniform defi- 
nition of evidence-based guidelines." 

A well-conducted systematic review or meta-analysis 
has minimized bias in the identification and selection of 
studies for inclusion, and in the presentation of the results. 
If a meta-analysis or high-quality systematic review can- 
not be identified, or if one is identified that has not been 
recently updated, searching should proceed to identify re- 
levant randomized controlled trials (RCTs). To identify a sys- 
tematic review, meta-analysis, or randomized controlled 
trial, the most efficient approach is to search one or more 
of the available evidence-based resources for health care 
research such as the Cochrane Library (http://www3. 
interscience.wiley.com/cgi-bin/mrwhome/i06568753/ 
HOME), Clinical Evidence (http://clinicalevidence.bmj. 
com/ceweb/conditions/index.jsp), or Best Evidence.’ These 
databases are regularly updated and because they only con- 
tain systematic reviews, meta-analyses, critical appraisals 
of reviews, or randomized controlled trials the search pro- 
cess is greatly simplified.’ Another commonly searched and 
easily accessed database is PubMed (the National Library of 
Medicine’s free Web-based version of MEDLINE; http:// 
www.ncbi.nIm.nih.gov/sites/entrez/). 


Jargon Simplified: Systematic Review 

“Systematic reviews are distinct from narrative reviews 
because they address a specific clinical question, require 
a comprehensive literature search, use explicit selection 
criteria to identify relevant studies, assess the methodo- 
logic quality of included studies, explore differences 
among study results, and either qualitatively or quanti- 
tatively synthesize study results.”! 


Jargon Simplified: Meta-Analysis 

A meta-analysis is “an overview that incorporates a 
quantitative strategy for combining the results of several 
studies into a single pooled or summary estimate.”? 
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Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is an “experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.”° 


Whether research evidence can or should be applied to a 
specific patient cannot always be deduced straightfor- 
wardly from the research, as results of evaluative studies 
are usually given as average effects. Patients may differ 
from the average in ways that influence the effectiveness 
of the treatment or its impact.® Patients who participate 
in clinical trials may not be typical of the types of the peo- 
ple for whom the treatment is potentially useful; however, 


it is probably more appropriate to assume that research 
findings are generalizable across patients unless there is 
strong theoretical or empirical evidence to suggest that a 
particular group of patients will respond differently.° A 
key principal of evidence-based medicine is to include 
the integration of individual clinical expertise and patient 
preferences with the best available evidence. 

The decision whether to use a treatment also depends 
on factors that are specific to the patient. Orthopaedic sur- 
geons will find that research studies that consider a range 
of important outcomes of treatment are more useful than 
those that have only measured a few narrow clinical end 
points. Qualitative research done within robustly de- 
signed quantitative studies will help surgeons and patients 
to better understand and apply the research results. 


Challenges to Practicing Evidence-Based Medicine 


Unfortunately, practicing evidence-based medicine is not 
easy. As mentioned previously, evidence-based medicine 
involves a hierarchy of evidence, from meta-analyses of 
high-quality randomized controlled trials showing defini- 
tive results directly applicable to an individual patient, to 
relying on a physiological rationale or previous experience 
with a small number of similar patients. The hallmark of 
the evidence-based surgeon is that for particular clinical 
decisions, the orthopaedic surgeons must know the 
strength of the evidence, and therefore the degree of un- 
certainty. Surgeons must know how to frame a clinical 
question to facilitate use of the literature in its resolution. 
Evidence-based orthopaedics surgeons must know how to 
search the literature efficiently to obtain the best available 


evidence bearing on their question, to evaluate the 
strength of the methods of the studies they find, extract 
the clinical message, apply it back to the patient, and store 
it for retrieval when faced with similar patients in the fu- 
ture. 

Traditionally, these skills have not been taught in either 
medical school or postgraduate training. Although this si- 
tuation is changing, the biggest influence on how trainees 
will practice is their clinical role models, few of whom are 
currently accomplished evidence-based practitioners. The 
situation is even more challenging for those looking to ac- 
quire the requisite skills after completing their clinical and 
specialty training. 


Resources on Evidence-Based Orthopaedic Medicine 


There are several publications and courses available to 
orthopaedic surgeons to help them become an evidence- 
based orthopaedic surgeon. The Users’ Guide to the Medical 
Literature that has appeared in the Journal of the American 
Medical Association (JAMA) and more recently in a book 
titled, Users’ Guides to the Medical Literature: A Manual 
for Evidence-Based Clinical Practice. In addition, several 
medical, surgical, and orthopaedic journals have published 
series on evidence-based practice including the Journal of 
the Canadian Medical Association, the Canadian Journal of 
Surgery, the Journal of Bone and Joint Surgery American Vo- 
lume, and Clinical Orthopaedics and Related Research. The 
Journal of Orthopaedic Trauma publishes a section entitled, 
“Evidence-Based Orthopaedic Trauma,” which aims to pro- 
vide readers with a summary of the published literature on 
a variety of topics.'* In addition, each Evidence Summary in 
the Journal of Orthopaedic Trauma provides a review of one 
or more important concepts in health research methodol- 


ogy.!? These journal series help to provide orthopaedic sur- 
geons with the tools to critically appraise the methodolo- 
gical quality of individual studies and apply the evidence. 


Key Concepts: Evidence Resources 
e Journal of American Medical Association (JAMA) Users’ 
Guides 
— Oxman, AD, Sackett DL, Guyatt GH. Users’ guides to 
the medical literature. I. How to get started. Evi- 
dence-Based Medicine Working Group. JAMA 1993; 
270(17):2093-2095 
— Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the 
medical literature. II. How to use an article about 
therapy or prevention: A. Are the results of the study 
valid? Evidence-Based Medicine Working Group. 
JAMA 1993; 270:2598-2601 
— Guyatt GH, Sackett DL, Cook DJ. Users’ guides to the 
medical literature. II. How to use an article about 
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therapy or prevention: B. What are the results and 
will they help me in caring for my patients? Evi- 
dence-Based Medicine Working Group. JAMA 1994; 
271:59-63 


— Jaeschke R, Guyatt G, Sackett DL. Users' guides to the 


medical literature. III. How to use an article about a 
diagnostic test. A. Are the results of the study valid? 
Evidence-Based Medicine Working Group. JAMA 
1994; 271:389-391 

Jaeschke R, Gordon H, Guyatt G, Sackett DL. Users' 
guides to the medical literature. II]. How to use an ar- 
ticle about a diagnostic test. B. what are the results 
and will they help me in caring for my patients? Evi- 
dence-Based Medicine Working Group. JAMA 1994; 
271:703-707 

Laupacis A, Wells G, Richardson S, Tugwell P. Users’ 
guides to the medical literature. V. How to use an ar- 
ticle about prognosis. Evidence-Based Medicine 
Working Group. JAMA 1994; 272:234-237 

Oxman AD, Cook DJ, Guyatt GH. Users guide to med- 
ical literature. VI. How to use an overview. Evidence- 
Based Medicine Working Group. JAMA 1994; 
272:1367-1271 

Richardson WS, Detsky AS. Users' guides to the med- 
ical literature. VII. How to use a clinical decision ana- 
lysis. A. Are the results of the study valid? Evidence- 
Based Medicine Working Group. JAMA 1995; 273 
(16):1292-1295 

Richardson WS, Detsky AS. Users' guides to the med- 
ical literature. VII. How to use a clinical decision ana- 
lysis. B. What are the results and will they help me in 
caring for my patients? Evidence-Based Medicine 
Working Group. JAMA 1995; 273(20):1610-1613 
Wilson MC, Hayward RS, Tunis SR, Bass EB, Guyatt GH. 
Users guide to medical literature. VIII. How to use 
clinical practice guidelines (A). Are the recommenda- 
tions valid. Evidence-Based Medicine Working 
Group. JAMA 1995; 274:570-574 

Wilson MC, Hayward RS, Tunis SR, Bass EB, Guyatt GH. 
Users guide to medical literature. VIII. How to use 
clinical practice guidelines. (B) What are the recom- 
mendations and will they help you in caring for 
your patient? Evidence-Based Medicine Working 
Group. JAMA 1995; 274:1630-1632 

Guyatt GH, Sackett DL, Sinclair JC, Hayward R, Cook 
DJ, Cook RJ. Users guide to medical literature. IX. A 
method for grading health care recommendations. 
Evidence-Based Medicine Working Group. JAMA 
1995; 274:1800-1804 

Naylor CD, Guyatt GH. Users guide to medical litera- 
ture. X. How to use an article reporting variations in 
the outcomes of health services. Evidence-Based 
Medicine Working Group. JAMA 1996; 275:554-558 
Naylor CD, Guyatt GH. Users guide to medical litera- 
ture. XI. How to use an article about a clinical utiliza- 
tion review. Evidence-Based Medicine Working 
Group. JAMA 1996; 275:1435-1439 
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— Guyatt GH, Naylor, CD, Juniper E, Heyland, DK, 


Jaeschke R, Cook, DJ. Users’ guides to the medical lit- 
erature. XII. How to use articles about health-related 
quality of life. Evidence-Based Medicine Working 
Group. JAMA 1997; 277(15): 1232-1237 
Drummond MF, Richardson WS, O'Brien BJ, Levine M, 
Heyland D. Users guide to medical literature. XIII. 
How to use an article on economic analysis of clinical 
practice. A. Are the results of the study valid? Evi- 
dence-Based Medicine Working Group. JAMA 1997; 
277:1552-1557 

O'Brien BJ, Heyland D, Richardson WS, Levine M, 
Drummond MF. Users guide to medical literature. 
XIII. How to use an article on economic analysis of 
clinical practice. B. What are the results and will 
they help me in caring for my patients? Evidence- 
Based Medicine Working Group. JAMA 1997; 
277:1802-1806 

Dans AL, Dans LF, Guyatt GH, Richarson S. Users guide 
to medical literature. XIV. How to decide on the ap- 
plicability of clinical trial results to your patient. Evi- 
dence-Based Medicine Working Group. JAMA 1998; 
279:545-549 

Richardson S, Wilson M, Guyatt GH, Cook DJ, Nishi- 
kawa J. Users guide to medical literature. XV. How 
to use an article about disease probability for differ- 
ential diagnosis. Evidence-Based Medicine Working 
Group. JAMA 1999; 281:1214-1219 

Guyatt GH, Sinclair J, Cook DJ, Glasziou P. Users guide 
to medical literature. XVI. How to use a treatment re- 
commendation. Evidence-Based Medicine Working 
Group. JAMA 1999;281:1836-1843 

Barratt A, Irwig L, Glasziou P, et al. Users guide to 
medical literature. XVII. How to use guidelines and 
recommendations about screening. Evidence-Based 
Medicine Working Group. JAMA 1999;281:2029— 
Randolph AG, Haynes RB, Wyatt JC, Cook DJ, Guyatt 
GH. Users guide to medical literature. XVIII. How to 
use an article evaluating the clinical impact of a com- 
puter-based clinical decision support system. Evi- 
dence-Based Medicine Working Group. JAMA 
1999;282:67-74. 

Bucher HC, Guyatt GH, Cook, DJ, Holbrook A, McAlis- 
ter FA. Users guide to medical literature. XIX. Apply- 
ing clinical trial results. A. How to use an article mea- 
suring the effect of an intervention on surrogate end 
points. Evidence-Based Medicine Working Group. 
JAMA 1999; 282(8):771-778 

McAlister FA, Laupacis A, Wells GA, Sackett DL. Users' 
guides to the medical literature: XIX. Applying 
clinical trial results. B. Guidelines for determining 
whether a drug is exerting (more than) a class effect. 
Evidence-Based Medicine Working Group. JAMA 
1999;282(14):1371-1377 

McAlister FA, Straus SE, Guyatt GH, Haynes RB. Users’ 
guides to the medical literature. XX. Integrating 
research evidence with the care of the individual pa- 
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tient. Evidence-Based Medicine Working Group. 
JAMA 2000;283(21):2829-2836 

— Hunt DL, Jaeschke R, McKibbon KA. Users’ guides to 
the medical literature. XXI. Using electronic health 
information resources in evidence-based practice. 
Evidence-Based Medicine Working Group. JAMA 
2000;283(14):1875-1879 

— McGinn TG, Guyatt GH, Wyer PC, Naylor CD, Stiell IG, 
Richardson WS. Users’ guides to the medical litera- 
ture. XXII. How to use articles about clinical decision 
rules. Evidence-Based Medicine Working Group. 
JAMA 2000;284(1):79-84 

— Giacomini MK, Cook DJ. Users’ guides to the medical 
literature. XXIII. Qualitative research in health care. 
A. Are the Results of the Study Valid? Evidence-Based 
Medicine Working Group. JAMA 2000;284(3):357- 
362 

— Giacomini MK, Cook DJ. Users' guides to the medical 
literature XXIII. Qualitative research in health care. 
B. What are the results and how do they help me 
care for my patients? Evidence-Based Medicine 
Working Group. JAMA 2000;284(4):478-482 

— Richardson WS, Wilson MC, Williams JW, Moyer VA, 
Naylor CD. Users’ guides to the medical literature. 
XXIV. How to use an article on the clinical manifesta- 
tions of disease. Evidence-Based Medicine Working 
Group. JAMA 2000;284(7):869-875 

— Guyatt GH, Haynes RB, Jaeschke RZ, Cook DJ, Green L, 
Naylor CD, Wilson MC. Users’ guides to the medical 
literature. XXV. Evidence-based medicine: Principles 
for applying the users’ guides to patient care. Evi- 
dence-Based Medicine Working Group. JAMA 2000; 
284(10):1290-1296 


e Canadian Journal of Surgery Users’ Guides 

— Archibald S, Bhandari M, Thomas A; Evidence-Based 
Surgery Working Group. Users’ guide to the surgical 
literature: how to use an article about a diagnostic 
test. Can J Surg 2001; 44(1):17-23 

— Urschel JD, Goldsmith CH, Tandan VR, Miller JD; Evi- 
dence-Based Surgery Working Group. Users' guide to 
the surgical literature: how to use an article evaluat- 
ing surgical interventions. Can J Surg 2001; 44(2): 
95-100 

— Thoma A, Sprague S, Tandan V; Evidence-Based Sur- 
gery Working Group. Users' guide to the surgical lit- 
erature: how to use an article on economic analysis. 
Can J Surg 2001; 44(5): 347-354. 

— Hong D, Tandan VR, Goldsmith CH, Simunovic M; Evi- 
dence-Based Surgery Working Group. Users’ guide to 
the surgical literature: how to use an article reporting 
population-based volume-outcome relationships in 
surgery. Can J Surg 2002; 45(2):109-115. 

— Birch DW, Eady A, Robertson D, DePauw S, Tandan V; 
Evidence-Based Surgery Working Group. Users' 
guide to the surgical literature: how to perform a lit- 
erature search. Can J Surg 2003; Vol 46(2):136-141 


— Bhandari M, Devereaux PJ, Montori V, Cina C, Tandan 


V, Guyatt GH; Evidence-Based Surgery Working 
Group. Users' guide to the surgical literature: how 
to use a systematic literature review and meta-analy- 
sis. Can J Surg 2004; 47(1):60-67 

Thoma A, Farrokhyar F, Bhandari M, Tandan V; the 
Evidence-Based Surgery Working Group. Users' 
guide to the surgical literature: how to assess a ran- 
domized controlled trial (RCT) in surgery. Can J Surg 
2004; 47(3):200-208 

Birch DW, Goldsmith CH, Tandan V; Evidence-Based 
Surgery Working Group. Users’ guide to the surgical 
literature: self-audit and practice appraisal for sur- 
geons. Can J Surg 2005; 48(1):57-62 


e Canadian Medical Association Journal Tips for Learning 


and Teaching Series 


— Wyer PC, Keitz S, Hatala R, et al. Tips for learning and 


teaching evidence-based medicine: introduction to 
the series. Can Med Assoc J 2004; 171:347-348 
Barratt A, Wyer PC, Hatala R, et al.; The Evidence- 
Based Medicine Teaching Tips Working Group. 
Tips for learners of evidence-based medicine: 1. Rela- 
tive risk reduction, absolute risk reduction and 
number needed to treat. Can Med Assoc J. 2004; 
171: 353-358 

Montori VM, Kleinbart J, Newman TB, et al.; The Evi- 
dence-Based Medicine Teaching Tips Working 
Group. Tips for learners of evidence-based medicine: 
2. Measures of precision (confidence intervals). Can 
Med Assoc J 2004; 171:611-615 

McGinn T, Wyer PC, Newman TB, Keitz S, Leipzig R, 
Guyatt G; The Evidence-Based Medicine Teaching 
Tips Working Group. Tips for learners of evidence- 
based medicine: 3. Measures of observer variability 
(kappa statistic). Can Med Assoc J 2004; 171:1369- 
1373 

Hatala R, Keitz S, Wyer P, Guyatt G; The Evidence- 
Based Medicine Teaching Tips Working Group. Tips 
for learners of evidence-based medicine: 4. Assessing 
heterogeneity of primary studies in systematic re- 
views and whether to combine their results. Can 
Med Assoc J 2005; 172:661-665 

Montori VM, Wyer P, Newman TB, Sheri Keitz S, 
Guyatt G; The Evidence-Based Medicine Teaching 
Tips Working Group. Tips for learners of evidence- 
based medicine: 5. The effect of spectrum of disease 
on the performance of diagnostic tests. Can Med 
Assoc J 2005; 173: 385-390 


e Journal of Bone and Joint Surgery American Volume 
— Bhandari M, Guyatt GH, Montori V, Devereaux PJ, 


Swiontkowski MF. Users’ guide to the orthopaedic lit- 
erature: how to use a systematic literature review. 
J Bone Joint Surg Am 2002;84:1672-1682 


— Bhandari M, Guyatt GH, Swiontkowski MF. Users’ 


guide to the orthopaedic literature: how to use an ar- 
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ticle about a surgical therapy. J Bone Joint Surg Am 
2001; 83:916-926 

— Bhandari M, Montori VM, Marc MF, Swiontkowski 
MF, Guyatt GH. Users’ guide to the surgical literature: 
how to use an article about a diagnostic test. J Bone 
Joint Surg Am 2003; 85:1133-1140 
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sues in randomized controlled trials of surgical inter- 
ventions. Clin Orthop 2003; 413:25-32 

— Hartz A, Marsh JL. Methodologic issues in observa- 
tional studies. Clin Orthop 2003; 413:33-42 

— Montori VM, Swiontkowski MF, Cook DJ. Methodolo- 
gic issues in systematic reviews and meta-analyses. 
Clin Orthop 2003; 413:43-54 

Symposium: Issues in the Design, Analysis, and Critical 
Appraisal of Orthopaedic Clinical Research: Part II: 
Statistical Issues in the Design of Orthopaedic Studies 

— Bernstein J, McGuire K, Freedman KB. Statistical sam- 
pling and hypothesis testing in orthopaedic research. 
Clin Orthop 2003; 413:55-62 

— Bhandari M, Whang, W, Kuo JC, Devereaux PJ, Spra- 
gue S, Tornetta P. The risk of false positive results in 
orthopaedic surgical trials. Clin Orthop 2003;413: 
63-69. 

— Griffin D, Audige L. Common statistical methods in 
orthopaedic clinical studies. Clin Orthop 2003;413: 
70-79 

Symposium: Issues in the Design, Analysis, and Critical 
Appraisal of Orthopaedic Clinical Research: Part III: 
Outcomes Analysis of Orthopaedic Surgery 

— Jackowski D, Guyatt G. A guide to measurement. Clin 
Orthop 2003;413:80-89 

— Beaton D, Schemitsch E. Measure of health-related 
quality of life and physical function. Clin Orthop 
2003;413:90-105 

— Kocher MS, Henley MB. It is money that matters: de- 
cision analysis and cost-effectiveness analysis. Clin 
Orthop 2003;413:106-116 

Symposium: Issues in the Design, Analysis, and Critical 
Appraisal of Orthopaedic Clinical Research: Part IV: 
Evidence-based Orthopaedics 
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— Schunemann H, Bone L. Evidence-based orthopae- 
dics: a primer. Clin Orthop 2003;413:117-132 

— Gillespie LD, Gillespie WJ. Finding current evidence: 
search strategies and common databases. Clin Orthop 
2003;413:133-145 

— Dirschl D, Tornetta P, Bhandari M. Designing, conduc 
tion, and evaluation. Journal Clubs in Orthopaedic 
Surgery. Clin Orthop 2003;413:146-158 


e Journal of Orthopaedic Trauma Where's the Evidence? 
Evidence-Based Orthopaedic Trauma: A new Section 
in the Journal. Journal of Orthopaedic Trauma. 17(2): 
87, February, 2003. Bhandari, Mohit MD, Section Edi- 
tor, Evidence-Based Orthopaedic Trauma 


Several education courses also exist to train clinicians in 
the practice of evidence-based medicine. Dr. Gordon 
Guyatt chairs an annual workshop on how to teach evi- 
dence-based clinical practice at McMaster University in 
Hamilton, Ontario (http://clarity.mcmaster.ca/). The objec- 
tives of this workshop are to help participants: (1) advance 
their critical appraisal skills, (2) advance their skills in ac 
knowledging and incorporating values and preferences in 
clinical decision making, and (3) learn how to teach evi- 
dence-based clinical practice using a variety of educational 
models. The workshop is offered as a one-week intensive 
course where participants learn in small groups led by 
clinical epidemiologists and practitioners from McMaster 
and other institutions. The workshop consists of small 
and large group sessions, individual study time, and oppor- 
tunities for workshop participants to lead teaching ses- 
sions using their own ideas, materials, and reflecting their 
own experiences. 

The Oxford Centre for Evidence-Based Medicine has a 
variety of courses and workshops (http://www.cebm. 
net/). They offer workshops intended to serve as an intro- 
duction to evidence-based medicine; they are aimed at 
clinicians and other health care professionals who wish 
to gain knowledge of critical appraisal and experience in 
the practice of evidence-based health care. In addition, 
workshops on teaching evidence-based clinical practice 
are also offered. 

There are multiple Web sites that provide information 
on evidence-based medicine. For example, the Users’ 
Guide Web site (http: //www.usersguides.org/) is an online 
tool to guide clinicians in the appraisal and application of 
evidence into their everyday practice. Based on the popu- 
lar Users’ Guides series in JAMA, state-of-the-art contribu- 
tions are offered on evidence-based clinical practice, edi- 
ted by Professors Gordon Guyatt, Drummond Rennie, and 
Robert Hayward with contributions from more than 50 
of the most renowned evidence-based medicine educators 
and practitioners in the world. 
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Conclusions 


Becoming an evidence-based orthopaedic surgeon is not 
easy and requires sufficient training and dedication. 
More orthopaedic surgeons should be familiar with the 
principles that make up evidence-based orthopaedic sur- 
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Various Research Design Classifications 


“Good doctors use both individual clinical expertise and the best available external evi- 
dence, and neither alone is enough. Without clinical expertise, practice risks becoming 
tyrannised by evidence, for even excellent external evidence may be inapplicable to or in- 
appropriate for an individual patient. Without current best evidence, practice risks be- 
coming rapidly out of date, to the detriment of patients.” 


Summary 


In this chapter, the different types of research designs that 
investigators may report are presented. All research de- 
signs have potential advantages and limitations and un- 


Introduction 


Evidence-based practice involves finding the available evi- 
dence, assessing its validity and then using the strongest 
available evidence in conjunction with clinical acumen to 
inform decisions regarding care. Systematic reviews and 
randomized controlled trials represent the highest levels 
of evidence, whereas case reports and expert opinion are 
the lowest (Fig. 5.1). This hierarchy of evidence was devel- 
oped largely for questions related to interventions or ther- 
apy. For questions related to diagnosis, prognosis, or causa- 
tion, other study designs such as cohort studies or case- 
control studies will often be more appropriate. For these 
types of studies, it is useful to think of the various study de- 
signs not as a hierarchy, but as categories of evidence 
where the strongest design that is possible, practical, and 
ethical should be used. Further, the quality of each indivi- 
dual study still needs to be assessed for strengths and 
weaknesses by exploring the use of methodological safe- 
guards against bias that are relevant to the particular study 
design. Rules of evidence have been established to grade 
evidence according to its strength." 


Jargon Simplified: Evidence-Based Practice 
Evidence-based practice involves finding the available 
evidence, assessing its validity, and then using the stron- 
gest available evidence in conjunction with clinical acu- 
men to inform decisions regarding care. 


— David Sackett et al., 1996! 


derstanding these issues will allow surgeons to better de- 
termine the strength of evidence behind study results. 
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Fig. 5.1 The hierarchy of clinical evidence. 
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Research Design 


Clinical research can be experimental or observational. 
Observational studies can identify an association between 
two variables, but cannot establish a causal link. Experi- 
mental trials are used to address the question of causation. 
In experimental studies, the intervention is under the con- 
trol of the researcher, whereas in observational studies the 
researcher observes patients at a point in time (cross-sec 
tional studies) or over time (longitudinal studies). If the 
observations are made by looking forward and gathering 
new data, the study is prospective; if the event of interest 
has already occurred the study is retrospective. 


Experimental Studies 


When conducting an experimental study the allocation or 
assignment of individuals is under the control of the inves- 
tigator and thus can be randomized. Experimental studies 
can be either controlled (there is a comparison group) or 
uncontrolled. Uncontrolled studies provide weaker evi- 
dence and should not be used to guide practice; they are 
typically performed early in an area of research to explore 
the safety of a new intervention, to identify unanticipated 
effects, and to gather baseline data for the planning of 
more definitive trials. Controlled experimental studies - 
or randomized controlled trials (RCTs) - are the gold stan- 
dard by which all clinical research is judged. 


Randomized Controlled Trials 


The fact that randomization keeps study groups as similar 
as possible from the outset, together with other methodo- 
logical safeguards against bias such as blinding and con- 
cealment of allocation, means that RCTs have the greatest 
potential to minimize bias. “Bias is any factor or process 
that acts to deviate the results or conclusions of the study 
away from the truth, causing either an exaggeration or an 
underestimation of the effects of an intervention.”* Non- 
randomized clinical trials have been shown to exaggerate 
the estimates of effectiveness for a given intervention by 
up to 150%, or reduce the actual effectiveness by up to 
90%.? 


Jargon Detector: Bias 
Bias is “a systematic tendency to produce an outcome 
that differs from the underlying truth.” 


In any study involving patients there are potentially many 
unknown factors — genetic, environmental, or psychoso- 
cial factors, for example — which can have a bearing on 
the outcome. Randomization, if done properly, reduces 
the risk that these unknown factors will be unbalanced 


Key Concepts: Observational versus Experimental 
Study Designs 


Observation Studies Experimental Studies 


They can identify an association They can establish 


between two variables, but cannot causation. 

establish a causal link. The intervention is 
The researcher observes patients at a under the control of 
point in time (cross-sectional studies) the researcher. 

or over time (longitudinal studies). 


in the various study groups. Random allocation can be 
done by using random number tables or computer-gener- 
ated sequences. Dates of birth (even or odd), chart num- 
bers, or any other alternating type of sequence is inap- 
propriate because there is the potential for people asso- 
ciated with the study, either directly or indirectly, to guess 
the sequence. Although sometimes called “pseudo-” or 
“quasi-randomized,” these trials are nonrandomized. 

The vast majority of surgical trials are open trials be- 
cause both the investigator performing the surgery and 
the patient know the intervention. However, there are 
three other groups or individuals who can be blinded: 
the individual evaluating the outcome, the statistician 
doing the data analysis, and the investigators who write 
the results of the trial. To do this, the allocation code is 
not broken until these components are completed. 
Although blinding of the statistician is being done with in- 
creasing frequency, blinding of the investigator writing the 
report is rarely done. The use of sham therapy in surgical 
trials (to blind patients) has been a long-debated issue 
due to ethical concerns. However, there are reasons to 
suspect that surgical interventions, especially when per- 
formed to reduce subjective symptoms such as pain, may 
be associated with a pronounced placebo effect.° A recent 
trial by Moseley and colleagues has brought considerable 
attention to this debate.’ Briefly, their trial randomized 
180 patients with osteoarthritis of the knee to receive ar- 
throscopic débridement, arthroscopic lavage, or sham sur- 
gery. These investigators found that although both active 
therapies produced clinical benefit, neither was statisti- 
cally superior to the sham procedure. 


Jargon Simplified: Open Trials 

An open trial is a study in which both the investigator 
performing the surgery and the patient know the inter- 
ventions. 
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RCTs are not appropriate or feasible for all surgical inter- 
ventions. It has been estimated that nearly 60% of surgical 
research questions could not be answered by an RCT, even 
in ideal clinical circumstances. Furthermore, in cases 
where an RCT is appropriate there still exists a choice of 
trial design, each of which has strengths and limitations. 


Types of Randomized Controlled Trials 


Parallel Trials 

Most RCTs make use of a parallel design, in which partici- 
pants are randomized to two or more groups of equal 
size and each group is exposed to a different intervention. 
Parallel trials with more than one treatment arm (matched 
to acontrol arm) provide an opportunity to study multiple 
interventions or different exposures to an intervention; 
however, this also demands larger sample sizes to ensure 
such trials are adequately powered to detect clinically sig- 
nificant differences between interventions (Fig. 5.2). 


Crossover Trials 

A crossover trial design assures that each study participant 
will receive all study interventions; however, the order in 
which they receive the interventions is random. As such, 
each participant acts as their own control and individual 
patient characteristics that may influence response are 
therefore accounted for. Crossover trials produce within- 
participant comparisons, and thus they require less parti- 
cipants than parallel trials to produce statistically and 
clinically significant results. 

As a practical example, Assmus and colleagues? con- 
ducted a study to establish if intracoronary transplanta- 
tion of progenitor cells derived from bone marrow (BMC) 
or circulating blood (CPC) may improve left ventricular 
function after acute myocardial infarction. They enrolled 
eligible patients to receive BMC, CPC, or no cell infusion 
and used a crossover design to have participants act as 
their own control. 


5 Various Research Design Classifications 


The trial by Assmus et al? was suitable for a crossover de- 
sign as the following essential criteria were met: (1) Study 
participants were afflicted with a chronic or incurable con- 
dition. Conditions that may be resolved after a single inter- 
vention would not be suitable for crossover trials. (2) Study 
interventions had a rapid onset and a short duration of ef- 
fect. Interventions with a long duration of effect risk a car- 
ryover effect - when the effect of an intervention persists 
during the testing of another intervention. When the dura- 
tion of an intervention is known, treatment periods can be 
separated by sufficient time to allow for the effect to run its 
course. This period between treatments is known as the 
washout period. (3) The condition under study was stable 
over time to ensure that any effect noted during the study 
could be attributed to the treatment provided and not sim- 
ply a change in the condition that would have occurred 
with or without treatment. Differences between study 
periods that are the result of fluctuations in the condition 
being studied, and not the result of an intervention, are 
known as period effects. 


Factorial Design Trials 
An RCT using a factorial design allows interventions to be 
evaluated both individually and in combination with one 
another. Therefore, two or more hypotheses can be ex- 
plored in one experiment and the effect of combination 
therapy can be assessed. For example, in an RCT using a 
2 x 2 factorial design, participants are allocated to one of 
four possible combinations: (1) treatment A, (2) treatment 
B, (3) treatment A and B, or (4) no treatment (Fig. 5.3). As a 
practical example, consider the trial by Daltroy and collea- 
gues, in which they used a 2 x 2 factorial RCT design to ex- 
plore the effect of two common psychoeducational pro- 
grams, alone or in combination, on postoperative outcome 
of 222 elderly patients undergoing total hip or knee repla- 
cement.!° 

With crossover trials, there may be an interaction be- 
tween interventions, and this may have an impact on the 
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sample size. Interactions are more common when inter- 
ventions share similarities in their mechanisms of action; 
they result when the effect of one intervention is influ- 
enced by another intervention. For example, in the case 
of a negative interaction, in which the overall effect of in- 
dividual interventions is reduced when they are provided 
together, the sample size would need to be increased to 
still detect a clinically significant difference. 


n-of-1 Trials 

A criticism of the preceding RCT designs is that they may 
provide good information on treatment outcome for the 
average patient, but are poorly equipped to provide indivi- 
dual-specific information. Randomized trials that include 
only one patient are possible, and require limited re- 
sources. Such n-of-1 studies are conducted by systemati- 


Observational Studies 


As stated above, there are situations where RCTs may not 
be necessary, appropriate, ethical, or feasible. In general, 
questions of therapy are best answered by RCTs (or 
meta-analyses of RCTs if available), whereas questions of 
diagnosis, prognosis, and causation may be best addressed 
by observational studies. Observational studies, which are 
frequently undertaken in surgery, provide weaker empiri- 
cal evidence than do experimental studies because of the 
potential for large confounding biases to be present. 

To a large extent, the type of observational study done 
depends on the rarity of the disease or condition and on is- 
sues related to human resources and economics. Usually, 
several methods of answering a question are possible 
and the strongest design should be used. 


Key Concepts: When to Use an Observational Study 
In general, questions of therapy are best answered by 
RCTs (or meta-analyses of RCTs if available), whereas 
questions of diagnosis, prognosis, and causation may 
be best addressed by observational studies. 


Types of Observational Studies 


The Cohort (Incidence) Study 

In acohort study, it is known at the outset whether people 
have been exposed or not to a treatment or possible causal 
agent (i.e., a type of trauma, surgical procedure, or drug) 
and are divided into groups or cohorts (exposed versus 
nonexposed) on this basis. They are then followed forward 
in time (prospectively) for years or even decades to see 
how many in each group develop a particular outcome. 
These studies are usually less expensive and easier to ad- 
minister than RCTs. They may also be ethically more accep- 
table because a potentially beneficial treatment is not 
withheld, and conversely, a possibly harmful treatment is 
not given. Cohort studies are susceptible to bias by differ- 


cally varying the management of a patient’s illness during 
a series of treatment periods (alternating between experi- 
mental and control interventions) to confirm the effective- 
ness of treatment in the individual patient.'! The number 
of pairs of interventions often varies from two to seven, 
but this number is not specified in advance and the clini- 
cian and patient can decide to stop when it becomes clear 
that there are, or are not, important differences between 
interventions. 

As an example, Nesathurai and Harvey”? reported on a 
patient who underwent endoscopic laser ablation of the 
left T2 sympathetic ganglion. The patient subsequently de- 
veloped severe left-sided facial perspiration, and the inves- 
tigators used an n-of-1 trial design to establish the efficacy 
of a Clonidine patch to decrease the incidence and fre- 
quency of gustatory facial sweating. 


ential loss to follow-up and the lack of control over risk as- 
signment. The major disadvantage is that one can never be 
sure that the cohorts were well matched and that there 
were not other factors that might have influenced the re- 
sults. In addition, for rare disorders, the sample size or 
length of follow-up needed to show an effect may be pro- 
hibitively large. 

As an example, Ciminiello and colleagues recently re- 
ported on the results of total hip arthroplasty managed 
by either regular (n = 60) or small incision technique 
(n = 60). The two groups were matched for age, sex, body 
mass index, American Society of Anesthesiologists Score, 
diagnosis (osteoarthritis), prosthesis, type of fixation, an- 
esthesia, surgical approach, and intraoperative patient po- 
sitioning.'? The authors did not find any differences in out- 
comes, but until an RCT is conducted questions will remain 
about the relative effectiveness of these competing proce- 
dures. 

A variation of a cohort study is a longitudinal study in 
which there is only one group. This group is comprised of 
individuals who have a positive screening test or who 
have all been diagnosed with an early stage of a disease. 
They are then followed and evaluated on a repeated basis 
to assess the development of the disease or for particular 
outcome measures. For example, Cheung and colleagues 
used a longitudinal study design to quantify curve progres- 
sion in 105 patients with idiopathic scoliosis. 


The Case-Control Study 

This is a retrospective observational study in which the 
proportion of cases with a potential risk factor is compared 
with the proportion of controls (individuals without the 
disease or event) with the same risk factor. This is a rela- 
tively quick and inexpensive study and is often the best de- 
sign for rare disorders or when there is a long time lag be- 
tween the exposure and the outcome. As with cohort stu- 
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dies, case-control studies are vulnerable to unmeasured 
confounding variables. The members of the case and con- 
trol groups must have had an equal likelihood of develop- 
ing the disease under study in their baseline preexposure 
state. Variables that are linked to exposure and predisposi- 
tion to disease are referred to as confounding variables. 
The impact of those factors must be assessed and reflected 
in the controls; this issue can be addressed, in part, by 
matching members of the treatment and control groups 
on known confounding variables. 

Wahr and colleagues”? used a case-control design to ex- 
plore whether abnormal preoperative serum potassium 
levels were associated with adverse perioperative events. 
Two thousand four-hundred and two patients undergoing 
elective coronary artery bypass grafting comprised their 
sample. Outcomes were intraoperative and postoperative 
arrhythmias, the need for cardiopulmonary resuscitation, 
cardiac death, and death due to any cause prior to dis- 
charge, by preoperative serum potassium level. An RCT 
would have been unethical, and a cohort design would be 
difficult due to the anticipated low prevalence of outcome 
events. 


Cross-Sectional (Prevalence) Studies 

A cross-sectional study is a snapshot of a population at a 
specific point in time - a descriptive study of the relation- 
ship between an outcome (disease, trauma, etc.) and other 
factors, usually in a defined population. Cross-sectional 
studies are effective in determining the prevalence of a dis- 


Systematic Reviews and Meta-Analyses 


Basing important clinical decisions on single trials, espe- 
cially when the result is a change in treatment policy, is of- 
ten troublesome. Because of the numbers of patients 
needed to detect small to moderate differences for clini- 
cally important outcome measures, definitive answers 
may not be found in single studies, unless they are well-de- 
signed large trials. Trials of this size, which usually involve 
many thousands of patients, have rarely been performed in 
surgery. 

When the information from all relevant trials addressing 
the same question is combined using well-established, rig- 


Conclusions 


Although there are many surgical questions that cannot, or 
should not be informed by a RCT, when plausible, this 
study design (and particularly meta-analyses of RCTs) pro- 
vide evidence that is least likely to be affected by bias. Re- 
gardless of the research design, the methodological rigor 


5 Various Research Design Classifications 


ease or event (the proportion of the population that has 
the outcome of interest at a particular point in time) and 
the coexistence of associated variables. They do not give 
us information about disease or event incidence (the pro- 
portion of the population that gets the outcome of interest 
over a specified period of time). As an example, Nikolajsen 
and colleagues'® surveyed 1231 consecutive patients who 
had previously undergone total hip arthroplasty to explore 
the prevalence of postoperative chronic pain, and capture 
variables to explore associations. Although this type of 
study is relatively easy and inexpensive to carry out, it 
can only establish an association, not a cause and effect re- 
lationship. In addition, accurate reporting of the exposure 
and/or the outcome of interest may depend on patient’s 
recall of past events. 


Case Reports and Case Series 

Case reports and case series are often used to describe a 
rare disorder or a novel aspect of a less rare condition, a 
new treatment or innovation, or adverse effects of an inter- 
vention. They often provide a richness of information, 
which cannot be conveyed in a trial. The description of 
cases may alert surgeons to important new problems and 
then allow hypotheses to be developed, leading to focused 
studies of stronger design. Case reports and case series are 
of little use in guiding clinical practice because isolated ob- 
servations are collected in an uncontrolled, unsystematic 
manner and the information gained cannot be generalized 
to a larger population of patients. 


orous methodology, the result is a systematic review or 
overview. If the results of each trial are reported in such 
a way that they can be combined statistically, the result 
is a meta-analysis. Although systematic reviews and 
meta-analyses are observational, retrospective research 
studies, they employ scientific methods to control bias 
and, in doing so, provide potent methods for synthesizing 
and summarizing data. Systematic reviews are considered 
the highest level in the evidence hierarchy for informing 
clinical decision making. 


under which the study was conducted must always be 
evaluated to properly assess the strength of its findings. 
Each study design is discussed in detail in subsequent 
chapters. 
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Hierarchy of Research Studies: From Case Series to Meta-Analyses 


“Evidence-based medicine emphasizes a hierarchy of evidence to help with clinical deci- 


sion making.” 


Summary 


Common hierarchies of different study designs in evi- 
dence-based medicine are reviewed. A system for grading 


Introduction 


Evidence-based medicine involves the conscientious use of 
current best evidence from clinical care research in making 
health care decisions.? A fundamental principle of evi- 
dence-based medicine is that a hierarchy of research study 
design exists. Surgeons practicing evidence-based medi- 
cine frequently use the best available evidence, along 
with the integration of clinical expertise with patients’ va- 
lues as well as clinical circumstances, and apply the results 
of the research evidence to their patient care.? It is neces- 
sary to place the available literature into a hierarchy of evi- 
dence to allow for clearer communication when discussing 
studies in day-to-day activities such as teaching rounds or 
discussions with colleagues, when practicing evidence- 
based medicine, and when conducting a systematic review 
of the literature to establish a recommendation for prac 
tice." In addition to understanding the hierarchy of evi- 
dence, it is important to have a comprehension of study 
design and study quality, as well as other aspects, which 
can make placing the study within the hierarchy diffi- 
cult.4° 

For orthopaedic surgeons, the integration of research 
evidence in their practice requires an understanding of 
what constitutes high-quality and low-quality evidence.! 
For instance, a case report describing a surgical interven- 
tion makes one less certain about an intervention when 
compared with evidence summarized in a meta-analysis 
from a systematic review of multiple large, high-quality 
randomized controlled trials.! To acknowledge these qual- 
ity issues that also are linked to confidence in the results, 


— Schunemann & Bone, 2003! 


the quality of evidence is also provided. 


evidence-based medicine emphasizes a hierarchy of evi- 
dence to help with clinical decision making. Confidence 
in research results is greatest if systematic error and bias 
is low.! 


Jargon Simplified: Bias 
Bias is “a systematic tendency to produce an outcome 
that differs from the underlying truth.”! 


Several hierarchies of evidence and systems for grading the 
evidence exist, ranging from simply evaluating to study 
design to complex tables taking into account the methodo- 
logical considerations of each study design. The orthopae- 
dic surgical literature can be classified into those articles 
with a primary interest in therapy, prognosis, harm, and 
economic analysis. In addition, within each classification 
there is a hierarchy of study design.” Recently, some ortho- 
paedic journals including the Journal of Bone and Joint Sur- 
gery American Volume and Clinics in Orthopaedics and Re- 
lated Research have adopted the reporting of levels of evi- 
dence with the individual studies. In many cases, the level 
of evidence and grading system adopted from the Oxford 
Centre for Evidence-Based Medicine has been imple- 
mented (Tables 6.1, 6.2, 6.3, and 6.4).’ These four tables 
provide a hierarchy of study design for therapeutic studies, 
prognostic studies, diagnostic studies, and economic and 
decision analyses, which are reviewed and discussed in 
this chapter. 
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Table 6.1 Levels of Evidence for Therapeutic Studies 


Level of Evidence 


Therapeutic Studies: Investigating the Results of Treatment 











Level | High-quality randomized controlled trial with statistically significant difference or no statistically significant 
difference, but narrow confidence intervals 
Systematic review of level | randomized controlled trials (and study results were homogeneous) 

Level II Lesser-quality randomized controlled trial (e.g., <80% follow-up, no blinding, or improper randomization) 
Prospective comparative study 
Systematic review of level Il studies or level | studies with inconsistent results 

Level Ill Case-control study 
Retrospective comparative study 
Systematic review of level Ill studies 

Level IV Case series 

Level V 


Expert opinion 


Table 6.2 Levels of Evidence for Prognostic Studies 


Level of Evidence 


Prognostic Studies: Investigating the Effect of a Patient Characteristic on the Outcome of Disease 











Level | High-quality prospective study (all patients were enrolled at the same point in their disease with 
280% follow-up of enrolled patients) 
Systematic review of level | studies 
Level II Retrospective study 
Untreated controls from a randomized controlled trial 
Lesser-quality prospective study (e.g., patients enrolled at different points in their disease or <80% follow-up) 
Systematic review of level II studies 
Level Ill Case-control study 
Level IV Case series 
Level V 


Expert opinion 


Table 6.3 Levels of Evidence for Diagnostic Studies 


Level of Evidence 


Diagnostic Studies: Investigating a Diagnostic Test 














Level | Testing of previously developed diagnostic criteria in series of consecutive patients (with universally applied 
reference gold standard) 
Systematic review of level | studies 

Level Il Development of diagnostic criteria on basis of consecutive patients (with universally applied reference gold 
standard) 
Systematic review of level II studies 

Level Ill Study of nonconsecutive patients (without consistently applied reference gold standard) 
Systematic review of level Ill studies 

Level IV Case-control study 
Poor reference standard 

Level V 


Expert opinion 
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Table 6.4 Levels of Evidence for Economic Analyses and Decision Models 


Level of Evidence 


Economic and Decision Analyses: Developing an Economic or Decision Model 














Level | Sensible costs and alternatives, values obtained from many studies, multiway sensitivity analyses 
Systematic review of level | studies 

Level Il Sensible costs and alternatives, values obtained from limited studies, multiway sensitivity analyses 
Systematic review of level Il studies 

Level III Analyses based on limited alternatives and costs, poor estimates 
Systematic review of level III studies 

Level IV No sensitivity analyses 

Level V Expert opinion 


Common Hierarchies of Study Design for Treatment Studies 


Therapeutic studies investigate the results of a treatment 

or intervention. The simplest hierarchy of evidence system 

involves only looking at the study design, not study quality. 

For instance, studies evaluating a therapy, such as a novel 

orthopaedic surgical technique commonly use the follow- 

ing hierarchy of evidence in descending order: 

e Meta-analysis and systematic reviews of randomized 
controlled trials 

e Randomized controlled trials 

e Cohort studies 

e Case-control studies 

e Case series 

e Case reports 

e Narrative reviews and editorials (expert opinions) 

e Animal research 

e In vitro research 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is “an experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.® 


A cohort study is a “prospective investigation of the fac- 
tors that might cause a disorder in which a cohort of in- 
dividuals who do not have evidence of an outcome of in- 
terest but who are exposed to the putative cause are 
compared with a concurrent cohort who are also free 
of the outcome but not exposed to the putitative cause. 
Both cohorts are then followed to compare the incidence 


of the outcome of interest.”” 


Jargon Simplified: Case-Control Study 

A case-control study is “a study designed to determine 
the association between an exposure and outcome in 
which patients are sampled by outcome (some patients 


Jargon Simplified: Cohort Study 
| with the outcome of interest are selected and compared 


with a group of patients who have not had the outcome), 
and the investigator examines the proportion of patients 
with the exposure in the two groups.”” 


This ranking has an evolutionary order, moving from sim- 
ple observational methods at the bottom of the list through 
to increasingly sophisticated and statistically refined 
methodologies at the top of the list. In other words, as 
the research design becomes more rigorous, moving 
from bottom of the list to the top of the list, the quality 
of evidence increases, and the chance for bias decreases.! 
The randomized controlled trial, by the nature of randomi- 
zation, has the ability to help reduce bias.” This is extre- 
mely important as bias can confound the outcome of a 
study such that the study may over- or underestimate 
what the treatment effect is.? Randomization is able to 
help to control for both known and unknown prognostic 
variables within the sample population, and therefore 
help attain a more accurate estimation of the truth.” Obser- 
vational studies are more prone to bias compared with the 
randomized controlled trial because it is not possible to 
control for the prognostic factors. 

Similar systems have been proposed by Schunemann 
and Bone (Table 6.5) and by Guyatt and Rennie (Table 
6.6).? Guyatt and Rennie rank the n-of-1 randomized con- 
trolled trial at the top of the hierarchy of evidence. A 
criticism of the other randomized designs is that they 


Table 6.5 Hierarchy of Study Design Recommended by 
Schunemann and Bone 


Randomized controlled trial 

Controlled trials (before and after studies) 
Case-control studies and cohort studies 
Cross-sectional studies 

Expert opinions, case reports, and case series 


Source: Data from Schunemann HJ, Bone L, Part IV. Evidence-based 
orthopaedics: a primer. Clin Orthop Relat Res 2003; 413:117-132. 
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Table 6.6 Hierarchy of Study Design Recommended by 
Guyatt and Rennie 


n-of-1 randomized controlled trial 

Systematic review of randomized controlled trials 

Single randomized controlled trials 

Systematic review of observational studies addressing patient- 
important outcomes 

Single observational study addressing patient-important out- 
comes 

Physiologic studies 

Unsystematic clinical observations 


Source: From Guyatt GH, Rennie D, eds. Users’ Guides to the 
Medical Literature: A Manual for Evidence-based Clinical Practice. 
Chicago, IL: American Medical Association Press 2001. Reprinted 
by permission. 


may provide good information on treatment outcome for 
the average patient, but are poorly equipped to provide 
individual-specific information.’ Randomized controlled 
trials in individuals are possible and require limited re- 
sources. Such n-of-1 studies are conducted by systemati- 
cally varying the management of a patient’s illness during 
a series of treatment periods (alternating between experi- 
mental and control interventions) to confirm the effective- 
ness of treatment in the individual patient.’ The number of 
pairs of interventions often varies from two to seven, but 
this number is not specified in advance and the clinician 
and patient can decide to stop when it becomes clear 
that there are, or are not, important differences between 
interventions.® 


Key Concepts: n-of-1 Trials 

“n of 1 trials are conducted by systematically varying the 
management of a patient’s illness during a series of 
treatment periods (alternating between experimental 
and control interventions) to confirm the effectiveness 
of treatment in the individual patient.” 


The levels of evidence table proposed by the Centre for Evi- 
dence-Based Medicine (Oxford, United Kingdom) (Table 
6.1), takes into consideration study design and methodolo- 
gical quality of the trials.” High-quality randomized con- 
trolled trials with statistically significant difference or no 
statistically significant difference, but narrow confidence 
intervals and systematic reviews of level I randomized 
controlled trials with homogenous study results are classi- 
fied as level I evidence. 

When assessing the quality of a systematic review or 
meta-analysis, it is necessary to look at the study design 
and quality of the trials included in the systematic review 
or meta-analysis. Meta-analyses of randomized controlled 
trials use data from individual randomized controlled 
trials and then statistically pool the data.” This effectively 
increases the number of patients that the data was ob- 
tained from, thereby increasing the effective sample 
size. A major limitation of this pooling is that it is depen- 


dent on the quality of randomized controlled trials that 
were included in the meta-analysis.° Meta-analyses are 
discussed in Chapter 12. 

Lesser-quality randomized controlled trials (e.g., <80% 
patient follow-up, no blinding, or improper randomiza- 
tion), prospective comparative studies, and systematic re- 
views of level II studies or level I studies with inconsistent 
results are considered to be level II evidence. The prospec- 
tive comparative or cohort study includes two groups of pa- 
tients, one who received the intervention of interest and 
one who did not. Both groups are followed over time and 
the outcomes are assessed and compared. Due to the pro- 
spective nature, data collection and follow-up can be more 
closely monitored and attempts can be made to ensure that 
data collection and follow-up are complete and accurate.° 

Level II] evidence includes the case-control study, the ret- 
rospective comparative study, and systematic reviews of 
level III studies. The case-control study starts with a group 
who has the outcome of interest and looks back at other 
similar individuals to see what factors may have been pre- 
sent in the study group and may be associated with the out- 
come. Case-control studies are retrospective in nature and 
are prone to bias. Chapter 11 discusses the advantages and 
disadvantages of case-control studies. The retrospective 
comparative study has potential for bias if there is incom- 
plete data and follow-up, which often occurs with retro- 
spective study designs.° Chapter 10 describes both prospec- 
tive and retrospective cohort studies in greater detail. 

Case series are classified as level IV evidence. Case series 
are typically retrospective in nature and have no compar- 
ison group. They provide the outcomes for only one group 
in the population, usually those who received the inter- 
vention, and there is a high potential for bias especially if 
there is incomplete data collection or patient follow-up.° 
Because these studies are typically single-center or sin- 
gle-surgeon initiatives, the generalizability of the results 
may be limited.° Results of case series need to be inter- 
preted cautiously. In Chapter 8, the case series is described 
in detail. Expert opinion is classified as level V evidence. 

Although the above hierarchies of study design and the 
levels of evidence vary, the premise is similar among the 
hierarchies with high-quality randomized controlled trials 
at the top of the list, followed by observational study de- 
signs in the center of the list, and physiologic studies and 
unsystematic clinical observations at the bottom of the 
list. Inferences may be very strong if results come from a 
systematic review of methodologically strong randomized 
controlled trials with consistent results.? However, infer- 
ences should be weaker if only a single randomized con- 
trolled trial is being considered, unless it is very large 
and the investigators have enrolled a diverse patient popu- 
lation.” Because observational studies may underestimate 
treatment effects in an unpredictable fashion, the results of 
observational studies are far less trustworthy than those of 
randomized controlled trials. Physiologic studies and un- 
systematic clinical observations provide the weakest infer- 
ences about treatment effects.” 
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Common Hierarchies of Study Design for Prognostic Studies 


Clinicians require studies of patient prognosis, which ex- 
amine the possible outcomes of a disease or injury and 
the probability with which they can be expected to occur.” 
Although orthopaedic surgeons strive to restore health 
and function, sometimes they can only offer relief of dis- 
comfort and preparation for death or long-term disability 
by means of presenting the expected future course of the 
patient’s illness or injury.” To estimate a patient’s prog- 
nosis, we examine outcomes in groups of patients with a 
similar clinical presentation, for example, patients in the 
first year after a grade IIIc tibial shaft fracture. We may 
then refine our prognosis by looking at subgroups such 
as age and comorbidities (such as diabetes)? Prognostic 
factors are variables that predict which patients will have 
better or worse outcomes.” 

Several study designs are commonly used for prognostic 
studies that investigate the effect of a patient’s characteris- 


tic on the outcome of disease. From the levels of evidence 
table proposed by the Centre for Evidence-Based Medicine 
(Table 6.2), one can consider high-quality prospective stu- 
dies and systematic reviews of level I studies to be level I 
evidence. To be considered high quality, the prospective 
studies should have enrolled all patients at the same point 
in their disease and had a >80% follow-up of enrolled pa- 
tients. Several study designs are classified as level II evi- 
dence including retrospective studies, untreated controls 
from a randomized controlled trial, lesser-quality prospec 
tive studies, and systematic reviews of level II studies. Les- 
ser-quality prospective studies have patients enrolled at 
different points in their disease or <80% of patient fol- 
low-up. This system classifies case-control studies as level 
III evidence, case series as level IV evidence, and expert opi- 
nion as level V evidence. 


Common Hierarchies of Study Design for Diagnostic Studies 


A diagnostic test is useful only to the extent that it distin- 
guishes between conditions or disorders that might other- 
wise be confused.” The accuracy of a diagnostic test is best 
determined by comparing it to the truth or the gold stan- 
dard; however, several lesser-quality study designs are of- 
ten used to investigate a diagnostic test.” Level I studies in- 
clude the testing of previously developed diagnostic cri- 
teria in a series of consecutive patients with a universally 
applied reference gold standard and systematic reviews 


of level I studies (Table 6.3). Studies with diagnostic cri- 
teria based on consecutive patients with a universally ap- 
plied reference gold standard and systematic reviews of le- 
vel II studies are considered level II evidence. Level III evi- 
dence consists of studies of nonconsecutive patients with- 
out consistently applied reference gold standard and sys- 
tematic reviews of level III studies. Case-control studies 
and studies using a poor reference standard are level IV evi- 
dence, and expert opinion is level V evidence. 


Common Hierarchies of Study Design for Economic and Decision Analyses 


Economic analysis is a set of formal, quantitative methods 
used to compare alternative strategies with respect to their 
resource use and their expected outcomes.’ A full eco- 
nomic analysis must consider both the costs and the out- 
comes or consequences. How to conduct and evaluate 
economic analyses in orthopaedic surgery is described in 
Chapter 13. 

The Centre for Evidence-Based Medicine’s level of evi- 
dence table (Table 6.4) provides a complex classification 
system taking into consideration the methodological qual- 
ity for evaluating economic analyses and decision models.’ 
Studies that include sensible costs and alternatives, values 
obtained from many studies, and multiway sensitivity ana- 
lyses are considered to be level I evidence. In addition, sys- 
tematic review of level I studies are also classified as level I 
evidence. Level II studies are similar to level I evidence, ex- 


cept that the values are obtained from limited studies. Eco- 
nomic analyses that are based on limited alternatives and 
costs and provide poor estimates are considered to be level 
Ill evidence. Studies that do not include sensitivity analyses 
are classified as level IV evidence and expert opinion is 
considered to be level V evidence. 


Jargon Simplified: Sensitivity Analysis 

Sensitivity analysis is “any test of the stability of the con- 
clusions of healthcare evaluation over a range of prob- 
ability estimates, value judgments, and assumptions 
about the structure of the decisions to be made. This 
may involve the repeated evaluation of a decision model 
in which one or more of the parameters of interest are 
varied.” 
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Grades of Recommendation 


Assessing the quality of a study in relation to levels of evi- 
dence is extremely important when developing grades of 
recommendation.’ A grade of recommendation can only 
be developed after a thorough systematic review of the lit- 
erature and in many cases, after discussions with content 
experts.’ When developing grades of recommendation, it 
is extremely important to place weights on studies with 
more weight being given to studies that are high on the 
hierarchy of evidence and of high quality.* In contrast, 
less weight should be given to studies that are low on the 
hierarchy of evidence and of lower methodological quality. 

The GRADE (Grading of Recommendations Assessment, 

Development, and Evaluation) Working Group suggests a 

system for grading the quality of the evidence obtained 

from a thorough systematic review (Table 6.7).’ This grad- 
ing system should be applied to all of the outcomes of in- 
terest.° Once the total evidence has been graded, a recom- 
mendation for the treatment can be made.? The GRADE 

Working Group suggests that four areas should be consid- 

ered when making a recommendation for a treatment’: 

e What are the benefits versus the harm? Are there clear 
benefits to an intervention or is there more harm as a 
result? 

e What is the quality of the evidence? 

e Are there modifying factors affecting the clinical setting 
such as the proximity of qualified persons able to carry 
out the intervention? 

e What is the baseline risk for the potential population 
being treated? 


Once the above factors have been considered, the recom- 
mendation should be placed into one of four categories*: 
e Doit. 

e Probably do it. 

e Probably don’t do it. 

e Don’t do it. 


A Note on Using the Best Available Evidence 


Although randomized controlled trials provide the highest 
quality evidence, evidence-based medicine is not the 
science of randomized controlled trials, thus neglecting 
other study designs.! Evidence-based medicine acknowl- 
edges explicitly that a large body of state of the art evi- 
dence is derived from observational studies because 
higher-quality evidence often is not available.! In ortho- 
paedic surgery, few high-quality randomized controlled 
trials have been conducted evaluating existing surgical 
treatments. Therefore, orthopaedic surgeons are fre- 
quently forced to look for lower-quality evidence. For ex- 
ample, if a randomized controlled trial has not been con- 


Table 6.7 Criteria for Assigning the Grades of Evidence 


Type of Evidence 





Study design 

Randomized trial = high quality 

Quasi-randomized trial = moderate quality 
Observational study = low quality 

Any other evidence = very low quality 

Decrease grade if 

Serious (-1) or very serious (-2) limitation to study quality 
Important inconsistency (-1) 

Some (-1) or major (-2) uncertainty about directness 
Imprecise or sparse data (-1) 

High probability of reporting bias (-1) 

Increase grade if 


Strong evidence of association - significant relative risk of 
>2 (>0.5) based on consistent evidence from two or more obser- 
vational studies, with no plausible confounders (+1) 


Very strong evidence of association - significant relative risk of 
>5 (0.2) based on direct evidence with no major threats to validity 
(+2) 

Evidence of a dose-response gradient (+1) 

All plausible confounds would have reduced the effect (+1) 


Source: Adapted from Atkins D, Best D, Briss, PA, et al. Grading 
quality of evidence and strength of recommendations, BM] 2004; 
328:1490. Reprinted by permission. 


The grades of “do it” or “don’t do it” are defined as a judg- 
ment that most well-informed people would make and the 
grades of “probably do it” and “probably don’t do it” are de- 
fined as a judgment that the majority of well-informed 
people would make, but a substantial minority would 
not.’ The development of the grades of recommendation 
based on the GRADE Working Group system gives one 
the tools to convey the best available evidence to the pa- 
tient as well as to help the literature guide a busy clinician.” 


ducted evaluating a new treatment of interest, one may 
then look for cohort studies for the best available evidence. 
Ifa cohort study does not exist, one may review case series 
for the best available evidence. Therefore, the practice and 
application of evidence-based medicine requires an under- 
standing and critical evaluation of all study designs. In ad- 
dition to study design, the rules for formal assessment of 
evidence mentioned above allow critical appraisal of 
whether studies are executed and analyzed correctly. 

In addition, the hierarchy is not an absolute.’ If the treat- 
ment effects are sufficiently large and consistent, for in- 
stance, observational studies may provide more compel- 
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ling evidence than most randomized controlled trials.” At 
the same time, instances in which the results from a rando- 


Conclusions 


The hierarchy of evidence implies a clear course of action 
for surgeons addressing patient problems; however, tak- 
ing into consideration the methodological quality of the 
studies may be challenging. Surgeons should look for the 
highest available evidence from the hierarchy. The evi- 
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Randomized and Nonrandomized Studies 


“In the field of observation, chance favors only the prepared mind.” 


Summary 


Observational studies are regarded as inferior to experi- 
mental studies by many surgeons; therefore, their results 
are believed to be less convincing. However, observational 
studies are easier to perform and thus are commonly done. 


Introduction 


As opposed to experimental studies that include random 
allocation of a patient to certain interventions, in observa- 
tional studies no random allocation is made and a group 
(single cohort = case series) or multiple groups (multiple 
cohorts) are followed over time and assessed for the devel- 
opment of an outcome. Cohort studies can include histor- 
ical controls or ideally involve a concurrent selection of 
controls. In addition, observational studies also include 
case-control studies in which the compared groups are de- 
fined based on outcome. Cohort studies can be prospective 
or retrospective, whereas a case-control by the nature of 
the study is always retrospective. 


— Louis Pasteur 


In this chapter, investigations of “real-life topics” with ob- 
servational as well as experimental study designs are com- 
pared. 


Jargon Simplified: Cohort Study 

A cohort study is a “prospective investigation of the fac- 
tors that might cause a disorder in which a cohort of 
individuals who do not have evidence of an outcome of 
interest but who are exposed to the putative cause are 
compared with a concurrent cohort who are also free 
of the outcome but not exposed to the putitative cause. 
Both cohorts are then followed to compare the incidence 
of the outcome of interest.”! 


Experimental versus Observational Study Design 


There is an ongoing debate about whether the results of 
nonrandomized studies are consistent with the results de- 
rived from randomized controlled trials on the same topic. 
Clearly, the gold standard for comparing two different in- 
terventions is a randomized controlled trial (RCT). Rando- 
mized studies are being considered experimental studies, 
whereas nonrandomized studies fall into the observational 
study design. The purpose of randomization is to balance 
known and unknown prognostic factors influencing the 
outcome. Therefore, differences in outcome can be attribu- 
ted to one of the two interventions that are being com- 
pared. Although some techniques exist to deal with known 
prognostic factors in nonrandomized studies, randomiza- 
tion is the only method to balance prognostic factors that 
might influence the outcome. An imbalance in prognostic 
factors between groups can bias the results in favor of 
one of the interventions that are being investigated. 

If data from RCTs is less prone to bias, why not rely on 
RCT data only? Why do we need observational studies? 
There are multiple reasons. Most important, not all re- 


search questions can be answered by a clinical trial or 
need to be answered by clinical trial. Evaluating the out- 
come of multiple prognostic factors and their interaction 
on an outcome of interest, for example, the influence of 
age, gender, comorbidities, smoking, type of treatment, 
type of rehabilitation, and so on, on posttraumatic arthritis 
cannot be done with a trial design. This would require a 
prospective cohort study. Although you could design a 
RCT to different treatments or different rehabilitation 
protocols, you cannot do that with other factors that you 
cannot influence such as age or gender. It is also clearly un- 
ethical to assign patients to smoking or nonsmoking 
groups. Furthermore, in some situations, a RCT study de- 
sign might be possible, but not feasible, for example, 
when looking at rare outcomes or outcomes that take too 
long too develop such as arthritis. Case-control studies 
might be more appropriate for those study types. However, 
in many cases, RCTs as well as observation studies like co- 
hort studies can be designed to answer a specific research 
question, for example, compare two different treatment 
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options for hip fracture fixation. Although RCTs are prefer- 
able concerning limiting bias, there are also downsides to 
RCTs: they can be extremely expensive and their general- 
izability is often limited because they are usually per- 
formed in large university centers as opposed to a small 
community hospital. Available resources and surgeons’ ex- 
pertise might differ in both settings; therefore, results 
based on data from large university centers might not be 
generalizable to small community hospitals. Furthermore, 
in RCTs groups are deliberately balanced to evaluate the ef- 


Disadvantages of Observational Studies 


Criticism of observational studies is largely based on the 
nonrandom allocation to interventional and control 
groups and its resulting biases. Unrecognized confounding 
factors may distort results by over- or underestimating the 
treatment results. 


7 Randomized and Nonrandomized Studies 


fect of a single aspect on the outcome parameter, whereas 
in the real world a particular treatment effect may be 
caused by a combination of factors. Thus, outcomes follow- 
ing treatment may differ from those expected form RCT 
findings. 


Jargon Simplified: Experimental Study Design 
In an experimental study design as opposed to an obser- 
vational study, treatment allocation is performed in a 
randomized manner to limit confounding bias. 


Observational studies have a higher potential of being 
unsystematic; therefore, recommendations based on co- 
hort and case-control studies are much weaker. 

Sackett et al? stated that observational studies are not re- 
liable and should not be funded. Furthermore, they sug- 
gested, “if you find that a study was not randomized, you 
should stop reading it and go on to the next article.” 


Comparison of Data from Randomized Trials and Observational Studies in Literature 


Multiple reviews in the literature have compared the re- 
sults of RCTs to results of observational studies on a magni- 
tude of topics.” Generally, authors appear to have strong 
a priori views about the usefulness of evidence from obser- 
vational studies. Critics of observational studies often cite 
Sacks et al” from 1982 to support their view. In this review, 
the authors compared six new therapies that have been 
evaluated by RCTs (n = 50) to cohort studies with historical 
controls (n = 56). The rates of the assessed outcome para- 
meters were similar in both study designs. However, the 
new therapy (intervention group) was effective in 44 of 
56 studies when a cohort study design was chosen, 
whereas it was only effective in 10 of 50 RCTs. The authors 
concluded that biases in patient selection might have 
weighted the outcome of cohort studies in favor of new 
therapies. However, when comparing RCT results with re- 
sults from cohort studies with a concurrent selection of 
controls (instead of using historical controls), two more re- 
cent reviews found no significant differences in the magni- 
tudes of the treatment effect between RCTs and cohort stu- 
dies.“ Benson and Hartz? identified 136 observational stu- 
dies for 19 treatment comparisons and the corresponding 
RCTs. In only 2 of 19 comparisons did the treatment effect 
derived from the observational study data lie outside the 
95% confidence interval (CI) derived from RCTs. The 
authors concluded that there is little evidence to suggest 
that observational studies overestimate the treatment ef- 
fect. Similar results were reported by Concato et al? who 
evaluated five clinical topics in 99 reports. However, only 
cohort studies with concurrent selection of controls and 


case-control designs were included; cohort studies with 
historical controls were not considered. Point estimates 
for data derived from cohort studies was similar to RCTs 
with less variability resulting in higher precision (= nar- 
rower 95% CIs). The authors concluded that the results of 
well-designed observational studies (with either a cohort 
or a case-control design) do not systematically overesti- 
mate the magnitude of the effects of treatment as com- 
pared with those in randomized, controlled trials on the 
same topic. Those results were confirmed in a review by 
Ioannidis et al who evaluated 45 topics for which both 
randomized and nonrandomized studies have been per- 
formed and considered in meta-analyses of dichotomous 
outcomes. There was a high correlation (r = 0.75) between 
odds ratios of randomized and nonrandomized studies. 
However, nonrandomized studies had a tendency to 
show larger treatment effects (28 versus 11). Despite con- 
cluding that differences in estimated magnitude of treat- 
ment effect are very common between randomized and 
nonrandomized studies, the authors also pointed out 
that there were higher rates of discrepancy in studies in- 
volving historical controls and the odds of having a discre- 
pancy decreased when publications were more recent. 

MacLehose et alf investigated the association between 
methodological quality and the magnitude of estimates 
of effectiveness by comparing estimates of effectiveness 
derived from RCTs and observational studies. They found 
14 meta-analyses with 38 comparisons - 25 of low quality 
and 13 of high quality. Effect-size discrepancies between 
RCTs and observational studies were lower in high-quality 
studies. 
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Examples from the Literature: An Example of a 
Cost-Benefit Analysis 

Source: Benson K, Hartz AJ. A comparison of observa- 
tional studies and randomized, controlled trials. N Engl 
J Med 2000;342(25):1878-18862 

Abstract 

Background: For many years, it has been claimed that 
observational studies find stronger treatment effects 
than randomized, controlled trials. We compared the re- 
sults of observational studies with those of randomized, 
controlled trials. 

Methods: We searched the Abridged Index Medicus and 
Cochrane databases to identify observational studies re- 
ported between 1985 and 1998 that compared two or 
more treatments or interventions for the same condition. 
We then searched the Medline and Cochrane databases to 
identify all the randomized, controlled trials and obser- 
vational studies comparing the same treatments for 
these conditions. For each treatment, the magnitudes 
of the effects in the various observational studies were 


Conclusion 


There continues to be a debate about the validity of data 
derived from observational studies. Based on recent data, 
results derived from high-quality observational studies 
yield similar magnitudes of effect compared with RCTs. 
The main features of high-quality observational studies 
are the use of concurrent controls as opposed to historical 
controls; a prospective study design if possible; the selec- 
tion of consecutive patients with clear defined exclusion 
and inclusion criteria; and similar to RCTs, blinding of 
whoever can be blinded (patient, investigator, outcome as- 
sessor, data analyst) whenever possible; and a high follow- 
up rate. Balance of known prognostic factors can be 
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The Clinical Case Series 


“Attempt easy tasks as if they were difficult, and difficult as if they were easy.” 


Summary 


In this chapter, the role of case series within the hierarchy 
of study designs is described. The advantages and disad- 
vantages of clinical case series are highlighted. 


Introduction 


In the hierarchy of evidence, case series are considered le- 
vel IV evidence. Case series do not have a comparison 
group; this sets them apart from the evidence levels above 
randomized controlled trials (RCTs), cohort studies, and 
case-control studies. The lack of a comparison group 
makes data derived from case series more prone to bias, 
specifically selection bias; therefore, they are ranked after 
RCTs (level I), cohort studies (level II), and case-control stu- 
dies (level III), and just above expert opinion, which is con- 
sidered level V evidence. 


— Baltasar Gracian 


Jargon Simplified: Selection Bias 

“Selection bias occurs when the outcomes of a trial are 
affected by systematic differences in the way in which 
individuals are accepted or rejected for a trial, or in the 
way in which the interventions are assigned to indivi- 
duals once they have been accepted into a trial.”! 


The majority of clinical studies in medical literature is case 
series. In an assessment of 51 consecutive articles published 
in the American Journal of Bone and Joint Surgery, 57% of all 
articles were case series.” It is therefore important to under- 
stand the advantages and the drawbacks of case series. 


Potential Advantages and Disadvantages of Case Series 


The main advantages of case series are that they easier to 
do than RCTs, cohort studies, or case-control studies, and 
require less financial resources and are less time-consum- 
ing especially compared with RCTs. However, investigators 
as well as readers have to be aware that there are severe 
limitations to conclusions drawn from case series, mainly 
due to selection bias. Preselected cases may bias the results 
in the direction of the investigators preconceptions and 
also severely limit the generalizability of the findings. 

If selection bias is such an important issue, why are case 
series required? Why not start with an RCT or any other 
higher-level study right away? One major problem of ran- 
domized trials is expertise bias - surgeons involved in a 
trial might have a high level of expertise with one proce- 
dure, but only limited experience with the other procedure 
investigated - usually the novel procedure. Results of a RCT 
can therefore be biased against the newer procedure. Case 
series are useful for evaluating and refining new techni- 
ques before they are being investigated in a trial. At the 
same time, surgeons can gain some initial experience 
and then expertise bias in future studies can be limited 


by “taking the learning curve out of the equation.” Case ser- 
ies are also helpful for establishing a treatment protocol 
and assessing the feasibility of more advanced studies. Ad- 
ditionally, baseline data for sample size calculations for 
comparative studies can be obtained. 


Jargon Simplified: Expertise Bias 

Expertise bias is the bias that can occur when a surgeon 
involved in a trial might have a high level of expertise 
with one procedure, but only limited experience with 
the other procedure investigated - usually the novel 
procedure. 


Key Concepts: Features of a Well-Designed Case Series 
e A priori defined study protocol 

e Clear inclusion and exclusion criteria 

e Prospective data collection 

e Consecutive patient enrollment 

High follow-up rate 

Clinically relevant outcome measures 
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Conclusions 


Features of a well-designed case series that help to limit se- 
lection bias and other biases include an a priori defined 
study protocol with clear exclusion and inclusion criteria, 
prospective data collection of consecutive patients, blind- 
ing of patients independent blinded outcome assessment 
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and data analysis, the use of clinically relevant valid and re- 
liable outcome measures, a high follow-up rate, and strict 
protocol adherence. Case series are useful for evaluating 
new techniques and procedures before higher-level stu- 
dies are considered. 
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The Case-Control Study 


“Opinion is that exercise of the human will which helps us to make a decision without 


information.” 


Summary 


The case-control study is the focus of this chapter. This 
type of study begins with the identification of individuals 
who already have the outcome of interest - the cases, 
which are compared with a suitable control group without 


Introduction 


Acase-control study is a type of observational study. In the 
case-control study, a group of individuals which already 
have the outcome of interest, which are referred to as 
cases, are compared with a suitable control group without 
the outcome of interest. A case-control study is a descrip- 
tive research method that involves comparing the charac 
teristics of the group of interest, such as a group with a spe- 
cific disease or who have been exposed to a particular risk 
factor (cases), with a comparison or reference group with- 
out the characteristic of interest or the disease or condition 
(controls).! The aim of the comparison is to identify factors 
that occur more or less often in the cases in comparison 
with the controls, to determine if the factors increase or 
decrease the risk factors associated with the disease or 


The Case-Control Study Defined 


In a case-control study, investigators compare cases and 
controls for the presence of risk factors. Case-control stu- 
dies are retrospective in nature, as they include a passage 
of time and assess past characteristics and exposures in 
cases versus controls. The cases have the disease or charac- 
teristic of interest, whereas the controls do not. The inves- 
tigators ask about the history of contact with or exposure 
to suspected causes and compare the exposure rates be- 
tween the cases and the controls. The cases should have 
a higher frequency or degree of exposure, if the exposure 
under investigation is to be deemed a contributing factor. 
If controls are well chosen, the only difference between 
the cases and the controls will be the history of contact 
with or exposure to suspected causes (Fig. 9.1, Table 9.1). 


— John Erskine 


the outcome event - the controls. An overview of case se- 
lection of cases and controls, data analyses, and the limita- 
tions of a case-control study is presented. Details on the 
calculation of the odds ratio are also given. 


condition. The analysis will lead to the calculation of an 
odds ratio, which is an estimate of the contribution of a fac- 
tor to disease.' The purpose of this chapter is to describe 
the case-control study design. 


Jargon Simplified: Case-Control Study 

A case-control study is “a study designed to determine 
the association between an exposure and outcome in 
which patients are sampled by outcome (some patients 
with the outcome of interest are selected and compared 
with a group of patients who have not had the outcome), 
and the investigator examines the proportion of patients 
with the exposure in the two groups.”” 
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Fig. 9.1 Case-control study design. 
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Table 9.1 The Case-Control Study 


Cases 


co: 





Controls 








Have the condition or health outcome of interest 


Does not have the health condition 





Has a higher frequency or degree of exposure 


Serves as a control group 





— Ask about the history of contact with or exposure to suspected causes. 





— If controls are well chosen, the only difference between the cases and the controls will be the history of contact with or exposure to 


suspected causes. 


Key Concepts: Retrospective in Nature 

e Compare cases and controls for the presence of risk 
factors 

e Retrospective in nature 

e Includes passage of time 

e Assesses past characteristics and exposures in cases 
versus controls 


Selection of the Cases 


Cases are usually selected from among persons seeking 
medical care for the disease(s) under study. Newly diag- 
nosed cases (incident cases) are preferable because if an in- 
dividual has had the disease for a long period (prevalent 
cases), it will be more difficult for the individual to remem- 
ber exposures and to distinguish exposures that preceded 
the disease from those that occurred after the disease had 
developed (prevalent cases), thus it would be hard to dis- 
tinguish between cause and effect.? In addition, using indi- 
viduals who have had the disease for a long period can lead 
to an overrepresentation of cases of long duration because 
those who die of the disease or who are rapidly cured have 
a lower probability of inclusion? 

The time of diagnosis is often considered the time of on- 
set.” For example, a fracture, in which treatment is gener- 
ally sought rapidly at the time of diagnosis, is a reasonably 
good approximation to the time of onset.? For many other 
diseases, such as osteoporosis or arthritis, the actual time 
of onset is unknown or not clearly defined. Therefore, 
when trying to identify risk factors for a disease, it is im- 
portant to make sure that the exposure at least preceded 
the onset of the symptoms because this will render it un- 
likely that the person changed their exposure status as a 


result of the disease.* Diagnostic criteria should be clearly 
defined prior to the selection of the cases. The most com- 
mon way of selecting cases is to include all incident cases, 
as opposed to prevalent cases, in a defined population over 
a specific period, for example, including all new fracture 
cases who present to a local hospital over a defined per- 
iod.’ 


Jargon Simplified: Incident Cases 

“Incident cases are new occurrences of a condition (or 
disease) in a population over a period of time. Incidence 
refers to the number of new cases of disease occurring 
during a specified period of time; expressed as a percen- 
tage of the number of people at risk.”4 


Jargon Simplified: Prevalent Cases 

Prevalence refers to the total number of people with a 
disease or condition in a certain population at a certain 
timeframe. This includes both people who are newly di- 
agnosed and those who have had the disease or a condi- 
tion for a long time. Prevalent cases must have been in- 
cident cases at some earlier point. 


Selection of the Controls 


The control group is intended to provide an estimate of ex- 
posure to risk in the population from which the cases are 
drawn; therefore, they should be drawn from the same po- 
pulation, with the only difference between the two groups 
being the exposure. Controls enter the study as they are 
identified. If controls are not appropriately selected, the 
validity of the study may be limited. 

Ideally, controls would satisfy two requirements. Within 
the constraints of any matching criteria, their exposure to 
risk factors and confounders should be representative of 
that in the population at risk of becoming cases.° Also, 
the exposures of controls should be measurable with simi- 
lar accuracy to those of the cases.° Often, it proves impos- 
sible to satisfy both of these aims.° Examples of potential 
controls include population controls, neighborhood con- 
trols, hospital or registry controls, medical practice con- 
trols, friend controls, and relative controls. Two sources 
of controls are commonly used. Controls selected from 
the general population have the advantage that their expo- 


What Case-Control Studies Tell Us 


Case-control studies inform us about different experi- 
ences people have that may be related to a particular con- 
dition. They also tell us the odds of having one experience 
compared with other(s) for sick people and healthy people. 
Case-control studies cannot tell us about cause because 


Analysis of Data in Case-Control Studies 


The case group is compared with the control group on the 
proposed causal factors and a statistical analysis is done to 
estimate the strength of association of each factor with the 
studied outcome.’ From the data collected on the cases and 
the controls, an odds ratio can be calculated to describe the 
odds of a particular factor in individuals with the outcome 
of interest compared with those without the outcome.’ 


Jargon Simplified: Odds Ratio 

“The odds ratio is a ratio of the odds of an event in an ex- 
posed group to the odds of the same event in a group 
that is not exposed.” 


Key Concepts: Odds Ratio 

e A measure of the degree of association 

e The odds of exposure among the cases compared with 
the odds of exposure among the controls 

e Takes values between zero and infinity 

e One is the neutral value, which means that there is no 
difference between the groups compared. 


9 The Case-Control Study 


sures are likely to be representative of those at risk of be- 
coming cases.? However, assessment of their exposure 
may not be comparable with that of cases, especially if 
the assessment is achieved by personal recall. Cases are 
keen to find out what caused their illness and are therefore 
better motivated to remember details of their past than 
controls with no special interest in the study question. Mea- 
surement of exposure can be made more comparable by 
using patients with other diseases as controls, especially 
ifresearch participants are not told the exact focus of study.° 
However, their exposures may be unrepresentative.° 
When cases and controls are both freely available then 
selecting equal numbers will make a study most efficient.” 
However, the number of cases that can be studied is often 
limited by the rarity of the disease under investigation.’ In 
this circumstance, statistical confidence can be increased by 
taking more than one control per case.° There is, however, a 
law of diminishing returns, and it is usually not worth going 
beyond a ratio of four or five controls to one case. 


they are retrospective in nature, as they depend on past 
memories and records. It is important to remember that 
cases and controls may be different in more ways than 
health status. 


e Close to zero or infinity means that there is a large 
difference between the groups compared. 


Table 9.2 illustrates the basic method of calculating the 
odds ratio in a case-control study.” 

The odds ratio (OR) or the ratio of odds of exposure is 
thus given by a/c:b/d (or ad/bc). 

In a hypothetical example, an investigator would like to 
determine if patients with bone cancer have higher expo- 
sure levels to radiotherapy than people who do not have 


Table 9.2 Calculation of the Odds Ratio 











Exposure Disease 
Yes (cases) No (controls) 
Yes a b 
No c d 
Odds of Exposure a/c b/d 
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Table 9.3 Example of a Calculation of the Odds Ratio 











Exposure Disease 
Yes (cases) No (controls) 
Yes 30 10 
No 20 40 
Odds of Exposure 30/20 10/40 


bone cancer. The cases are newly diagnosed patients with 
bone cancer at a local cancer center and the controls are 
age- and sex-matched individuals without bone cancer. 
Each group is asked about prior exposure to radiotherapy. 
In the analysis, the investigator calculates an odds ratio 
based on the following data (Table 9.3). 

The odds ratio (OR) or the ratio of odds of exposure is 
thus given by a/c:b/d (or ad/bc). 


OR = (30)(40) / (20)(10) = 6 


In the above hypothetical example, an odds ratio of 6 is cal- 
culated. This indicates that there is a higher exposure to 
radiotherapy in cases than in controls and that bone cancer 
may be associated with radiotherapy. 


When to Use Case-Control Studies 


The case-control study can be useful in studying rare out- 
comes, outcomes with multiple potential etiologic factors, 
or looking at outcomes that take a considerable length of 
time to develop. Case-control studies have traditionally 
been used when it is not possible to manipulate the inter- 
vention, such as an environment exposure (i.e., asbestos 
dust, a landfill site, power lines, etc.). They are also appro- 
priate to use for investigating a rare side effect associated 
with an intervention. Case-control studies are best suited 
to the study of diseases for which medical care is sought, 
such as cancers or hip fractures. Otherwise, an expensive 
community survey would need to be undertaken to iden- 
tify cases, thus canceling out one of the advantages of 
case-control studies - the relatively low cost. 


Key Concepts: Why Use a Case-Control Study 
e Rare outcomes 
e Outcomes that take a long time to develop 


Limitations of Case-Control Studies 


Case-control studies are considered retrospective in nat- 
ure because they look back in time at information collected 
about past exposure to possible attributable factors. Be- 
cause the information is usually collected from patient’s 


Other statistical analysis techniques used when analyz- 
ing data from a case-control study are multiple regression 
and logistic regression.® Multiple regression analysis as- 
signs each hypothetical causal variable an estimated inde- 
pendent strength of association with the effect being mea- 
sured (i.e., what the correlation might be if all the other 
“causal variables” were identical except for the one being 
calculated), and an estimated confidence interval (i.e., the 
region within which the actual value of correlation might 
be expected to lie). The result may be a positive associa- 
tion, where the variable increases the chances of seeing 
the effect in question; negative, if the variable decreases 
the frequency of the effect; or zero, if the variable has no 
association with the effect, positive or negative. Usually, 
the variables used to select controls are included in the 
regression, to check on whether they were correctly 
balanced between the case and control populations. 


Jargon Simplified: Multiple Regression 

“Multivariable Regression is a type of regression that 
provides a mathematical model that explains or predicts 
the dependent or target variable by simultaneously con- 
sidering all of the independent or predictor variables.”4 


e Outcomes that are not feasible to study with a cohort 
study design 

e Simple to conduct 

e Inexpensive 

e Cost-effective method to study a rare disease 


Case-control studies can be conducted in a short time with 
small sample sizes, and for less money than many other 
types of research studies. They are also advantageous 
when the outcome of interest is very rare or duration of 
follow-up needed to detect the outcome of interest is long. 


Key Concepts: Advantage of a Case-Control Study 
The main advantage of the case-control study is that it 
enables us to study rare health outcomes without having 
to follow thousands of people, and is therefore generally 
quicker, cheaper, and easier to conduct than the cohort 
study. 


memories or their medical records, data may be inaccurate 
because of the tendency of those with a disease or poor 
outcome to ascribe this outcome to exposures (called recall 
bias), and lack of or biased information from medical re- 


cords. For example, people with a disease may overreport 
exposure due to guilt, knowledge, or selective memory. In 
addition, case-control studies do not provide information 
about the absolute risk of an adverse event, but only about 
the relative odds. 

Case-control studies suffer from the limitations of po- 
tential selection bias among participants, and extraneous 
confounding variables may explain any observed differ- 
ences between the cases and the controls.' In addition, 
case-control studies are also limited to only cases who 
have survived. This is termed selective survival because 
subjects with less severe forms of the disease end up in 
the study. 


Jargon Simplified: Selection Bias 

Selection bias is “a systematic error in creating interven- 
tion groups, causing them to differ with respect to prog- 
nosis. That is, the groups differ in measured or unmea- 
sured baseline characteristics because of the way in 
which participants were selected for the study or as- 
signed to their study groups.”® 


Jargon Simplified: Recall Bias 

“Recall bias occurs when patients who experience an ad- 
verse outcome have a different likelihood of recalling an 
exposure than the patients who do not have an adverse 
outcome, independent of the true extent of exposure.”” 


Jargon Simplified: Confounding Variables 

“A confounding variable is a factor that distorts the true 
relationship of the study variable of interest by virtue of 
also being related to the outcome of interest. A con- 
founding variable may mask an actual association or it 
may falsely demonstrate an apparent association be- 
tween the study variables where no real association be- 
tween them exists.”” 


Another potential limitation in case-control studies is 
overmatching of cases and controls. Overmatching refers 
to controlling out the variable of interest when rigorously 
matching many variables of interest when selecting the 
cases and controls.' In other words, the control group is 
so closely matched to the case group and it is similar to 
the case group; hence, the exposure distributions are 
very similar. 
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Key Concepts: Limitation of the Case-Control Study 
The primary limitation of the case-control study is that 
there is bias in the retrospective documentation of risk 
and there is no control for confounding variables. 


Examples from the Literature: Example of a 
Case-Control Study 

Source: Vestergaard P, Rejnmark L, Mosekilde L. Fracture 
risk associated with use of nonsteroidal anti-inflamma- 
tory drugs, acetylsalicylic acid, and acetaminophen and 
the effects of rheumatoid arthritis and osteoarthritis. 
Calcif Tissue Int 2006;79(2):84-94,'° 

Abstract: We studied the effects of various nonmor- 
phine pain medications as well as rheumatoid arthritis 
and osteoarthritis on fracture risk in a nationwide 
case-control study. Cases were all subjects with any frac- 
ture sustained during the year 2000 (n = 124,655) in 
Denmark. For each case, three controls (n = 373,962) 
matched on age and gender were randomly drawn 
from the background population. The primary exposure 
variables were use of acetaminophen, nonsteroidal anti- 
inflammatory drugs (NSAIDs), or acetylsalicylic acid 
(ASA). Adjustments were made for several confounders. 
The effect of dose was examined by stratifying for cumu- 
lated dose (defined daily dose, DDD). For acetamino- 
phen, a small increase in overall fracture risk was ob- 
served with use within the last year (odds ratio [OR] = 
1.45, 95% confidence interval [CI] 1.41-1.49). For ASA, 
no increase in overall fracture risk was present with re- 
cent use. Significant heterogeneity was present for the 
NSAIDs; e.g., ibuprofen was associated with an increased 
overall fracture risk (OR = 2.09, 95% CI 2.00-2.18 for <20 
DDD), while celecoxib was not (OR = 0.76, 95% CI 0.51- 
1.13 for <20 DDD, 2p < 0.01 for comparison). Osteoar- 
thritis was associated with a decreased risk of any frac 
ture if the diagnosis had been made more than 1 year 
ago (OR = 0.70, 95% CI 0.67-0.72). Rheumatoid arthritis 
was associated with an increase in overall fracture risk 
if the diagnosis had been made within the last year 
(OR = 1.86, 95% CI 1.68-2.07). Weak analgesics may be 
associated with fracture risk in a varying way. The effects 
in most cases were small. Falls may be one reason for the 
increase in fracture risk with some NSAIDs. 
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Conclusions 


Although the case-control design presents difficulties in 
determining the order of causation, a well-designed and 
executed case-control study can provide scientific data 
that are quite valuable. The inferences made from a case- 
control study are weaker than for a randomized controlled 
trial or a prospective cohort study. Nevertheless, the data 
obtained from a case-control study can be used for the 
planning of future research. A case-control design is a 
cost-effective choice when exploring rare outcomes, or 
those that take a long time to develop (Table 9.4). 
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10 
The Prospective Cohort Study 


“We can be absolutely certain only about things we do not understand.” 


Summary 


In this chapter, how the prospective cohort study is used in 
orthopaedic surgery research is described. The advantages 
and disadvantages of this type of study design are given 


Introduction 


In epidemiological studies, the prospective cohort study 
compares a group of individuals who have a specified ex- 
posure characteristic (the exposed cohort) with a group 
of individuals without the identified exposure characteris- 
tics (the unexposed cohort) over a long period. A prospec- 
tive cohort study is a descriptive research method that in- 
volves following two cohorts, differentiated by exposure 
characteristics over time. The aim of the comparison is to 


— Eric Hoffer 


and examples of prospective and retrospective cohort stu- 
dies are provided. 


identify if disease or poor outcome occurs more or less of- 
ten in the exposed cohort in comparison with the unex- 
posed cohort, to determine if the exposure characteristics 
increase or decrease the risk factors associated with the 
disease or condition. In this chapter the prospective cohort 
study design and its use in orthopaedic surgery research is 
described (Fig. 10.1). 


Prospective Cohort Studies in Orthopaedic Surgery Research 


In a prospective cohort study, investigators compare ex- 
posed and unexposed cohorts over a specified period for 
the presence of disease. Prospective cohort studies begin 
with the selection of a group of individuals who do not 
have the disease or outcome of interest. The first step is 
to identify all eligible patients, obtain their informed con- 
sent, and then enter them into the study. The investigator 
allows the “natural process” to determine which treatment 
the patient receives. For instance, in a surgical prospective 
cohort study comparing nonoperative versus operative 
management for the treatment of a fracture, the patient’s 
surgeon decides whether the patient receives nonopera- 
tive versus operative treatment. All patients are then fol- 
lowed over time and outcomes are compared. 


Jargon Simplified: Cohort 

“A cohort is a group of people who share a common char- 
acteristic or experience within a defined time period 
(e.g., are born, leave school, lose their job, are exposed 
to a drug or a vaccine, etc.) (Fig. 10.2).”! 


Retrospective versus Prospective Cohort Studies 


Cohort studies can be either retrospective or prospective. 
A prospective cohort study defines the groups before the 
study is done and follows the groups forward in time, 
whereas a retrospective cohort study does the grouping 
after the data are collected. An example of a retrospective 
cohort study is completing a chart review comparing two 
surgical techniques. 





Fig. 10.1 Epidemiological cohort 
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A prospective study defines the groups before the study 
is done and follows the patients forward in time. 


| Jargon Simplified: A Prospective Study 


A retrospective study defines the groups after the data 
has already been recorded as in a chart review. 


| Jargon Simplified: A Retrospective Study 


Examples from the Literature: Example of a Retro- 
spective Cohort Study 

Source: Vitale MG, Stazzone EJ, Gelijns AC, Moskowitz AJ, 
Roye DP Jr. The effectiveness of preoperative erythro- 
poietin in averting allogenic blood transfusion among 
children undergoing scoliosis surgery. J Pediatr Orthop 
B 1998;7:203-209.” 

Abstract: Concerns about the transmission of the hu- 
man immunodeficiency virus (HIV) have driven the evo- 
lution of surgical transfusion practices including the use 
of preoperative erythropoietin (rhEPO). Although there 
is significant experience documenting the efficacy of 
preoperative rhEPO in reducing transfusion require- 
ments for adult patients, there is little experience in 
the pediatric population. With 178 pediatric patients 
who underwent surgery for spinal deformity, a retro- 
spective cohort study was performed using patient 
charts, administrative records, and blood bank computer 
data. Of these patients, 44% received erythropoietin and 
55% did not. From the entire population, 17.5% were in 
the rhEPO treatment group that received homologous 
blood transfusion compared with 30.6% in the untreated 
group (P < 0.05). Among the children with idiopathic 
scoliosis, this effect was more pronounced, with 3.9% 
of rhEPO patients receiving blood transfusion compared 
with 23.5% of nontreated patients (P = 0.006). Addition- 
ally, rhEPO treatment was associated with a significantly 
decreased length of stay only for patients in the idio- 
pathic group (9.3 versus 6.7, P = 0.02). Use of preopera- 
tive erythropoietin in pediatric patients undergoing sco- 
liosis surgery resulted in higher preoperative hematocrit 
levels. Significantly lower rates of transfusion were 
noted only in the idiopathic group, however. Although 
there is a possibility of erythropoietin “resistance” in 
the neuromuscular and congenital patients, alternative 
explanations for the lack of effect on transfusion rates 
may include under dosing and biases existent in this 
nonrandomized retrospective study. 


Examples from the Literature: Example of a Prospec 
tive Cohort Study 

Source: Shen WJ, Liu TJ, Shen YS. Nonoperative treat- 
ment versus posterior fixation for thoracolumbar junc 
tion burst fractures without neurologic deficit. Spine 
2001; 26:1038-1045.? 

Study Design: A prospective clinical trial was conducted. 
Objective: To compare the results of nonoperative treat- 
ment versus short-segment posterior fixation using 
pedicle screws. 

Summary of Background Data: A previous study 
showed that nonoperative treatment with early mobili- 
zation produced good results, even when the posterior 
column was involved. 

Methods: This study involved 80 patients. Inclusion cri- 
teria required the following: neurologically intact pa- 
tient, single-level closed burst fracture involving T11- 
L2, no fracture dislocations or pedicle fractures, age of 
18 to 65 years (nonpathologic adult), and no other major 
organ system or musculoskeletal injuries. Patients in the 
nonoperative group (n = 47) were allowed activity to the 
point of pain tolerance beginning on the day of injury 
using a hyperextension brace. Patients in the operative 
group (n = 33) underwent three-level (one above, one 
at fracture level, and one below) fixation using VSP 
or TSRH instrumentation. The follow-up period was 2 
years. 

Results: The surgical group had less pain up to 3 months 
and a better Greenough Low Back Outcome Score up to 6 
months, but the outcome was similar afterward. No neu- 
rologic deficit in any patient. In the nonoperative group, 
the kyphosis angle worsened by 4 degrees, and the ret- 
ropulsion decreased from 34% to 15%. In the operative 
group, there was one case of superficial infection and 
two cases of broken screws. The kyphosis angle was im- 
proved initially by 17 degrees, but this was gradually 
lost. Hospital charges were four times higher in the op- 
erative group. 

Conclusions: Short-segment posterior fixation provides 
partial kyphosis correction and earlier pain relief, but 
the functional outcome at 2 years is similar. Early activity 
to the point of pain tolerance can be safely allowed. 
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What Prospective Cohort Studies Tell Us 


Because the defining characteristic of a cohort is the shar- 
ing of acommon exposure characteristic, etiology may be 
inferred by following a large cohort over a long time. How- 
ever, distinguishing causality from mere correlation can- 
not usually be done with results of a cohort study alone.! 
Most associations from cohort studies are merely hypoth- 
esis generating, unless they are extremely strong. 


When to Use Prospective Cohort Studies 


Prospective cohort studies often require a vast amount of 
resources to successfully perform; therefore, it is advised 
that the proof of a relationship between exposure charac 
teristic and outcome be conducted by a less-expensive 
cross-sectional study design. After the appropriate vari- 
ables to be measured have been suggested by cross-sec 
tional studies, the use of the prospective cohort study is 
then warranted. 

As the strongest type of observational study, the pro- 
spective cohort study can be used when the gold standard, 
a randomized controlled trial (RCT), is not suitable. These 
situations arise when performing a RCT is infeasible or un- 
ethical.’ For example, to study the relationship between 
smoking cigarettes and lung disease, it would be unethical 
to randomize patients to smoking and nonsmoking 
groups; therefore, a prospective cohort study would be 
more suitable for this kind of situation. In addition, pro- 
spective cohort studies and retrospective cohort studies 
are cheaper and easier to conduct than an RCT. 


Advantages of a Prospective Cohort Study 


An advantage of the prospective cohort study compared 
with the case control study is its resistance to patient recall 
bias as data are collected during the progress of the study 
rather than after the patient has developed the condition 
in question. Due to the longitudinal nature of the prospec 
tive cohort study, it also has the advantage of documenting 


Conclusions 


Randomized controlled trials (RCTs) are a superior metho- 
dology in the hierarchy of evidence because they limit the 
potential for bias by randomly assigning one patient pool 
to an intervention and another patient pool to noninter- 
vention (or placebo).' This minimizes the chance that the 


10 The Prospective Cohort Study 


a clear timeline of progression from exposure to disease. 
Also, because a cohort is assembled based on a single expo- 
sure criteria, prospective cohort studies enable us to study 
multiple outcomes related to a single exposure. 

It is also possible to match participants in a cohort study 
for possible confounding variables. Eligibility criteria and 
outcome measures can be standardized and insights about 
etiology can be made.* 


Jargon Simplified: Confounding Variable 

“A confounding variable is a factor that distorts the true 
relationship of the study variable of interest by virtue of 
also being related to the outcome of interest. Confound- 
ing variables are often unequally distributed among 
groups being compared.”° 


Limitations of a Prospective Cohort Study 


A prospective cohort study requires a lot of administration 
to follow a large number of patients over a long period to 
minimize attrition bias and is expensive.” Therefore, pro- 
spective cohort studies are best conducted after a relation- 
ship between variables has already been confirmed with a 
less resource taxing study design. Prospective cohort stu- 
dies do not provide empirical evidence that is as strong 
as that provided by a properly executed RCT due to the 
fact that prospective cohort studies are observational in 
nature.” 

Threats to validity in a cohort study include the inherent 
prognostic comparability of the two groups. Therefore, the 
disease or outcome of interest may be related to unidenti- 
fied risk factors.’ In addition, it is not possible to blind the 
attending health care provider or the research partici- 
pant.’ It is possible to blind the outcome assessor in most 
prospective cohort studies in orthopaedic surgery. 


Jargon Simplified: Attrition Bias' 

Attrition bias refers to systematic differences between 
the two cohorts in the loss of participants from the 
study, either from loss to follow-up, withdrawal of con- 
sent, or death. 


incidence of confounding variables will differ between 
the two groups. Nevertheless, it is sometimes not practical 
or ethical to perform RCTs to answer a clinical question.! 
The randomized controlled trial is described in greater de- 
tail in the next chapter. 
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The Randomized Trial 


“If everyone is thinking alike, then somebody isn't thinking.” 


Summary 


Randomized controlled trials are thought to represent the 
highest quality of evidence based on their methodological 
strengths of randomization of patient assignment and 


Introduction 


Analytic design strategies are broken into two types: ob- 
servational studies, such as case-control and cohort stu- 
dies, and experimental studies, also called randomized 
controlled trials. The difference between the two types of 
analytic studies is the role that the investigator plays in 
each of the studies. In the observational study, one simply 
observes the natural course of events. In a randomized 
controlled trial, patients are assigned randomly to a treat- 
ment group or a control group. The control group usually 
receives an accepted treatment or no treatment at all, 
whereas the treatment group is assigned the intervention 
of interest. 

Many researchers consider the randomized controlled 
trial (RCT) to be the optimal design for determining treat- 
ment efficacy and effectiveness. However, small, poorly 
conducted RCTs are more likely to result when they are 
methodologically challenging, and these RCTs may be mis- 
leading because their design affords them unwarranted 
credibility! Orthopaedic surgery is an area where sur- 
geons and researchers face several difficulties when de- 
signing RCTs. In addition, surgeons may encounter unex- 
pected challenges as the trial progresses. In fact, a recent 
review of a top orthopaedic surgery journal reported that 
very few orthopaedic surgery studies are designed as 
RCTs, and the ones that are may be limited by a lack of con- 
cealed randomization, lack of blinding of outcome asses- 
sors, and failure to report reasons for excluding patients.” 

Historically, the introduction of new surgical techniques 
resulted from the initiative of individual surgeons who 
have designed a new piece of equipment or developed a 
new technique.’ Some surgeons have resisted the wide- 


— General George S. Patton 


blinding of intervention and outcome. In this chapter, the 
randomized controlled trial is described. 


spread use of RCTs for the assessment of surgical proce- 
dures, despite strong advocacy by other surgeons.’ The tra- 
ditional arguments against RCTs in surgery include the 
ethical problem of using a placebo in the controls, the in- 
ability to blind patients and surgeons, the difficulty in ob- 
taining adequate patient numbers, the reluctance of a pa- 
tient to undergo a new operation or use a device until it 
has been fully tested and developed, the hazard to an indi- 
vidual patient’s welfare, and the risk of participating sur- 
geons’ variable skills influencing outcomes.* 

Although many researchers consider the RCT to be the 
most methodologically rigorous of all research designs, 
there are many situations in which RCTs are not feasible, 
necessary, appropriate, or even sufficient to answer impor- 
tant questions. Randomized controlled trials are the ideal 
study design to answer questions related to the effects of 
health care interventions that are small to moderate.° 
Other study designs may be more appropriate to answer 
many orthopaedic trauma research questions. If an RCT is 
not feasible for a particular study, then orthopaedic sur- 
geons should consider advancing the use of alternative re- 
search designs, such as prospective cohort studies.° 

Despite these limitations, orthopaedic surgeons can de- 
sign and conduct high-quality RCTs. Soloman and McLeod’ 
identified three strategies to increase the number and 
quality of RCTs in surgery, including education of surgeons 
in clinical research methods, improved funding of surgical 
RCTs, and compulsory evaluation of new techniques and 
technology before their general adoption is permitted. In 
this chapter, an overview of an RCT is provided and some 
common types of RCTs are described. 
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A Randomized Controlled Trial Defined 


An RCT is an experiment in which individuals are ran- 
domly allocated to receive or not receive an experimental 
preventative, therapeutic, or diagnostic procedure, then 
followed to determine the effect of the intervention.’ Re- 
searchers conducting an RCT typically seek to measure 
and compare different events or outcomes that are present 
or absent after the participants receive the intervention 
(Fig. 11.1).° 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is “an experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.” 


Pragmatic or Management Trials versus an 
Explanatory Trial 


Randomized controlled trials are often described in the lit- 
erature as explanatory trials, or as pragmatic or manage- 
ment trials. An explanatory trial refers to whether a treat- 
ment works in people who receive it, whereas pragmatic 
or management refers to whether an intervention works 
in people to whom it is offered.’ Trials designed to establish 
efficacy tend to be explanatory trials because they are de- 
signed to yield a clean evaluation of the effects of the new 
intervention. In contrast, effectiveness trials tend to be 
pragmatic because they attempt to evaluate the effects of 
the intervention in circumstances similar to those found 
by clinicians or surgeons in their daily practice.° The design 
of effectiveness RCTs are usually simpler than the design of 
efficacy trials, as effectiveness RCTs tend to follow less 
strict inclusion criteria, have more flexible-to-treatment 
regimens, and allow patients to accept or reject the treat- 
ment offered to them. Effectiveness RCTs usually evaluate 
interventions with proven efficacy that are offered to pa- 
tients under ordinary clinical conditions. 


Selection Bias 


Selection bias occurs when the process by which indivi- 
duals are assigned to their treatment group results in inter- 
vention and control groups with different prognostic fea- 
tures.° Prevention of this type of bias is the justification 
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Fig. 11.1 Randomization. 


for using RCTs to compare and evaluate interventions. By 
allocating people randomly to treatment groups, the char- 
acteristics of the people are likely to be similar across the 
treatment groups at baseline, which allows investigators 
the opportunity to isolate and quantify the effects of the 
interventions they are studying better, while minimizing 
the influence of other factors on study outcomes.’ In addi- 
tion, observational studies tend to show larger treatment 
effects than do RCTs, although systematic underestimation 
of treatment effects may also occur.'° 


Jargon Simplified: Selection Bias 

Selection bias is “a systematic error in creating interven- 
tion groups, causing them to differ with respect to prog- 
nosis. That is, the groups differ in measured or unmea- 
sured baseline characteristics because of the way in 
which participants were selected for the study or as- 
signed to their study groups.”!! 


Concealment 


Selection bias can still occur if individuals are randomly as- 
signed to their treatment groups. Selection bias can be in- 
troduced if some potentially eligible individuals are selec- 
tively excluded from the study because of prior knowledge 
of the group to which they would be allocated if they par- 
ticipated in the study. If this were to occur, the assignment 
to treatment would no longer be random. 

Randomization is concealed if the person who is making 
the decision about enrolling a patient is unaware of 
whether the next patient enrolled in the study will be en- 
tered into the treatment or the control group.* If randomi- 
zation is not concealed, patients with better prognoses 
may tend to be preferentially enrolled in the active treat- 
ment arm, resulting in an exaggeration of the apparent 
benefit of therapy or even falsely concluding that the treat- 
ment is efficacious.® 


Jargon Simplified: Concealment 

“Randomization is concealed if the person who is mak- 
ing the decision about enrolling a patient is unaware of 
whether the next patient enrolled in the study will be 
entered into the treatment or the control group.”® 


Empirical evidence confirms that the effects of new inter- 
ventions can be exaggerated if the randomization se- 
quence is not concealed from the investigators at the 
time of obtaining consent from the potential trial partici- 
pants.*!° A review found that concealed randomization 
was reported in less than a quarter (24%) of RCTs published 
in the orthopaedic surgery literature.'? Some investigators 
use sealed envelopes to conceal randomization. However, 
there are reports of investigators pre-viewing envelope 


www.urdukutabkhanapk.blogspot.com 


codes by holding the envelope up to bright light or by 
steaming the envelope open and then resealing it. Some 
simply use the allocation for the next patient or discard 
the envelope.’ The best method to conceal randomization 
is to implement randomization through a central methods 
center using an automated randomization system, '° which 
is described in greater detail in Chapter 29. 


Blinding 


Blinding refers to keeping one or more of the individuals 
involved in the trial unaware of the intervention. In an 
RCT, a number of individuals could possibly be blinded in- 
cluding (1) the patient, (2) the clinicians who administer 
the treatment, (3) the clinicians who take care of the pa- 
tients during the trial, (4) the outcome assessors and 
data collectors, (5) the data analysts, and (6) the investiga- 
tors who interpret and write the results of the trial. Be- 
cause there is confusion regarding blinding terminology 
(triple-blind, double-blind, and single-blind), it is better 
to be explicit about who is blinded in the course of a trial. 


Key Concepts: Who Can Be Blinded 

e The patient 

e The clinicians who administer the treatment 

e The clinicians who take care of the patients during the 
trial 

e The outcome assessors and data collectors 

e The data analysts 

e The investigators who interpret and write the results 
of the trial 


In surgical trials comparing different surgical techniques, it 
is not possible to blind the surgeon. However, the surgeon 
can be blinded in pharmaceutical or drug trials. The best 
way of avoiding the psychological impact of treatment 
(placebo effects) is to ensure that patients are unaware of 
whether they are receiving the experimental treatment.” 
The ultimate means of blinding patients in a surgical trial 
is to use a placebo operation; however, ethical issues can 
arise. 

Differences in patient care other than the intervention 
under study can also bias the results of a trial. For example, 
if treatment group patients receive more intensive post- 
operative care compared with control group patients, the 
results would yield an overestimate of the treatment effect. 
Effective blinding eliminates the possibility of either con- 
scious or unconscious differential administration of effec 
tive interventions to treatment and control groups. In 
many surgical trials, it is possible to have an independent, 
blinded clinician who will undertake the postoperative 
care. 

If the treatment or control group receives closer follow- 
up, target outcome events may be reported more fre- 
quently.'° In addition, unblinded clinical research coordi- 
nators, who are measuring or recording outcomes such 
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as clinical status, health-related quality of life, or radio- 
graphic findings may provide different interpretations of 
marginal results, or offer differential encouragement dur- 
ing performance tests, either one of which can distort their 
results." The clinical research coordinator assessing out- 
come can almost always be kept blind, even if (as is the 
case for many operative therapies) patients and treating 
surgeons cannot. Investigators can take additional precau- 
tions by constructing a blinded adjudication committee to 
review clinical data and decide issues such as whether a 
patient has had a malunion, nonunion, or other significant 
complication (Chapter 38). The more judgment is involved 
in determining whether a patient has suffered a target out- 
come (blinding is less crucial in studies in which the out- 
come is all-cause mortality, for instance) the more impor- 
tant blinding becomes.!° 


Examples from the Literature: Example of Blinding 
Source: Moseley JB, O’Malley K, Petersen NJ, Menke TJ, et 
al. Acontrolled trial of arthroscopic surgery for osteoar- 
thritis of the knee. NEJM 2002;347:82-88,!” 
Background: Many patients report symptomatic relief 
after undergoing arthroscopy of the knee for osteoar- 
thritis, but it is unclear how the procedure achieves 
this result. We conducted a randomized, placebo-con- 
trolled trial to evaluate the efficacy of arthroscopy for 
osteoarthritis of the knee. 

Methods: A total of 180 patients with osteoarthritis of 
the knee were randomly assigned to receive arthro- 
scopic débridement, arthroscopic lavage, or placebo sur- 
gery. Patients in the placebo group received skin inci- 
sions and underwent a simulated débridement without 
insertion of the arthroscope. Patients and assessors of 
outcome were blinded to the treatment group assign- 
ment. Outcomes were assessed at multiple points over 
a 24-month period with the use of five self-reported 
scores — three on scales for pain and two on scales for 
function - and one objective test of walking and stair 
climbing. A total of 165 patients completed the trial. 
Results: At no point did either of the intervention groups 
report less pain or better function than the placebo 
group. For example, mean (+SD) scores on the Knee-Spe- 
cific Pain Scale (range, 0 to 100, with higher scores indi- 
cating more severe pain) were similar in the placebo, la- 
vage, and débridement groups: 48.9 + 21.9, 54.8 + 19.8, 
and 51.7 + 22.4, respectively, at 1 year (P = 0.14 for the 
comparison between placebo and lavage; P = 0.51 for 
the comparison between placebo and débridement) 
and 51.6 + 23.7, 53.7 + 23.7, and 51.4 + 23.2, respectively, 
at two years (P= 0.64 and P= 0.96, respectively). Further- 
more, the 95 percent confidence intervals for the differ- 
ences between the placebo group and the intervention 
groups exclude any clinically meaningful difference. 
Conclusions: In this controlled trial involving patients 
with osteoarthritis of the knee, the outcomes after ar- 
throscopic lavage or arthroscopic débridement were 
no better than those after a placebo procedure. 
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Fig. 11.2 Randomization (a) by patient, (b) expertise-based design, and (c) cluster randomization. 


Unit of Randomization 


The most frequent unit of randomization in an RCT is the 
patient. The unit of randomization could also be the oper- 
ating surgeon, which is referred to as an expertise-based 
design.'4 The randomization of groups, or clusters, of pa- 
tients rather than individuals characterizes cluster-rando- 
mized trials. Figure 11.2 illustrates the differences be- 
tween patient randomization, the expertise-based design, 
and cluster randomization. 

Devereaux et al'*'> drew attention to the expertise- 
based trial design, which was originally proposed by Van 
der Linden to avoid bias in favor of the participants’ pretrial 
routine. The expertise-based design attempts to avoid the 
ethical, operational, and philosophical problems that ran- 
domization poses for patients and surgeons. Randomiza- 
tion by patient may pose several ethical difficulties. 
When a surgeon believes that one procedure is superior 
to another, but is forced by the study to perform the infer- 
ior operation, a traditional RCT is unethical.'® The exper- 
tise-based design also eliminates the learning period be- 
cause surgeons do not have to learn a new procedure to 
participate in an RCT. In addition, the surgeon is likely to 
be more skilled and comfortable with one procedure 
than with another. Forcing the surgeon to perform an op- 
eration that they are less comfortable with creates perfor- 
mance bias, further compromising the standard RCT 
model.'*'® These issues may jeopardize the surgeon-pa- 
tient relationship. 


Jargon Simplified: Expertise-Based Design 
Expertise-based design is when patients are rando- 
mized to two groups of specialists who perform the pro- 
cedures to be compared, in contrast to the classical de- 
sign where each participating surgeon performs the 
two procedures in a random order."4 


In the expertise-based design, patients are randomized to 
two groups of specialists who perform the procedures to 


be compared, in contrast to the classical design where 
each participating surgeon performs the two procedures 
in a random order.'* The expertise-based design is particu- 
larly well suited for the evaluation of operative proce- 
dures.'*!° This approach allows the surgeon to remain 
committed to a favored procedure, to project confidence 
to the patient, and to perform the procedure he or she is 
most skilled at. Therefore, the surgeon-patient relation- 
ship is unimpaired and there are few ethical dilemmas.'® 

A disadvantage of the expertise-based design is that it 
may be difficult to coordinate a call schedule with the par- 
ticipating surgeons especially at smaller centers, which 
may result in patients being missed or patients not receiv- 
ing the treatment (surgeon) they were randomized to. 

In cluster-randomized trials, groups or clusters of pa- 
tients are randomized rather than individuals. Examples 
of these groups or clusters are practices, hospitals (or 
trauma centers), families, geographical areas, and commu- 
nities!” Cluster randomization is appropriate when the 
health care intervention is delivered at the level of the or- 
ganizational unit. Examples include educational cam- 
paigns, patient management guidelines that target physi- 
cians or facilities, screening programs, and programs im- 
plemented at the institutional level.'”!® In orthopaedic 
surgery trials, cluster randomization is rarely the best de- 
sign for comparing surgical interventions, but can be useful 
for comparing pain management protocols, rehabilitation 
therapies, and educational programs following the treat- 
ment. 


Jargon Simplified: Cluster Randomization 

The randomization of groups or clusters of patients, 
rather than individuals, characterizes cluster-rando- 
mized trials. In orthopaedic surgery trials, cluster rando- 
mization is rarely the best design for comparing surgical 
interventions, but can be useful for comparing pain 
management protocols, rehabilitation therapies, and 
educational programs following the treatment. 
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Cluster randomization is also used to avoid contamination 
between treatment groups.!® For instance, if certain pa- 
tients on an orthopaedic ward or a rehabilitation center 
were randomly allocated to receive a new rehabilitation 
program, it would be difficult to prevent those given the 
new rehabilitation program from talking to others, thereby 
potentially distorting the results. In this instance, rando- 
mization by trauma center or by rehabilitation center 
would be appropriate. One disadvantage of a cluster RCT 
is that patients within the same practices or clusters are of- 
ten more similar to each other regarding key confounders 
or outcomes of interest than to patients in other practices 
or clusters.'? Consequently, the end result of using a cluster 
RCT is that the design is not as statistically efficient as an 
RCT where individual patients are randomized. Therefore, 
a compensatory increase in sample size is required to 
maintain statistical power in a cluster-randomized trial. 
Given that many orthopaedic trauma RCTs are single-cen- 
ter initiatives, have small sample sizes, and have limited 
funding, cluster RCTs may not be the most feasible study 
design.?? 


Randomized Controlled Trial Designs 


Several different RCT designs are used when comparing 
treatment alternatives including the parallel design, the 
crossover design, the factorial design, and the n-of-1 de- 
sign. The advantages and disadvantages of each trial design 
are summarized in Table 11.1.” 


Parallel Trial Design 


The parallel design is the simplest design and is also the 
most common design. In the parallel design, study partici- 
pants or subjects are randomized to two or more groups of 
equal size and each group is exposed to a different inter- 
vention. This trial design produces between-participant 
comparisons. Parallel trials with more than one treatment 
arm (matched to a control arm) provide an opportunity to 
study multiple interventions or different exposures to an 
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Fig. 11.3 Parallel trial design. (From Busse |W, Bhandari M, 
Schemitsch EH. Randomized trials in surgery. Tech Orthop 
2004;19:77-82. Reprinted by permission.) 


intervention; however, this also demands larger sample 
sizes to ensure such trials are adequately powered to de- 
tect clinically significant differences between interven- 
tions (Fig. 11.3). 


Crossover Trial Design 


A crossover trial design assures that each study participant 
will receive all study interventions; however, the order in 
which they receive the interventions is random.” As such, 
each participant acts as their own control. Crossover trials 
produce within-participant comparisons; thus, they re- 
quire less participants than parallel trials to produce statis- 
tically and clinically significant results.7° 

Several criteria must be met for a crossover trial design 
to be appropriate.”° Study participants must be afflicted 
with chronic or incurable conditions. Conditions that 
may be resolved after a single intervention would not be 
suitable for crossover trials. Study interventions should 
have a rapid onset and a short duration of effect. Interven- 
tions with a long duration of effect risk a carry-over effect 
when the effect of an intervention persists during the test- 
ing of another intervention. When the duration of an inter- 
vention is known, treatment periods can be separated by 
sufficient time to allow the effect to run its course. This 
period between treatments is known as the washout per- 


Table 11.1 A Comparison of Randomized Controlled Trial (RCT) Designs 











RCT Design Advantages Disadvantages 

Parallel Simple design that can be applied to most interventions Cannot provide patient-specific information 
and illnesses Each intervention under study requires a large increase in 

sample size. 

Crossover Smaller sample required Vulnerable to carryover effects and period effects 
Within subject comparisons ensure that all baseline Only possible to test rapid-acting interventions on chronic 
characteristics are equally distributed conditions 

Factorial Allows for the effect of combined therapies to be assessed Vulnerable to interaction effects 

n-of-1 Provides patient-specific information Results are not generalizable. 


Source: Adapted from Busse JW, Bhandari M, Schemitsch EH. Randomized trials in surgery. Tech Orthop 2004;19:77-82. Reprinted by 


permission. 
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Fig. 11.4 Crossover trial design. (From Busse JW, Bhandari M, 
Schemitsch EH. Randomized trials in surgery. Tech Orthop 
2004;19:77-82. Reprinted by permission.) 


iod. The condition under study must be stable over time to 
ensure that any effect noted during the study can be attrib- 
uted to the treatment provided and not simply to a change 
in the condition that would have occurred with or without 
treatment. Differences between study periods that are 
the result of fluctuations in the condition being studied, 
and not the result of an intervention, are known as period 
effects (Fig. 11.4). 


Jargon Simplified: Crossover Design 

A crossover trial design ensures that each study partici- 
pant will receive all study interventions; however, the 
order in which they receive the interventions is random. 
Each participant acts as his or her own control. 


Examples from the Literature: Example of a Crossover 
Design 

Source: Fisher P, Scott DL. A randomized controlled trial 
of homeopathy in rheumatoid arthritis. Rheumatology 
2001;40:1052-1055.”! 

Abstract 

Objective: To test the hypothesis that homeopathy is ef- 
fective in reducing the symptoms of joint inflammation 
in rheumatoid arthritis (RA). 

Method: This was a 6-month randomized, crossover, 
double-blind, placebo-controlled, single-center study 
set in a teaching hospital rheumatology out-patient 
clinic. The participants of the study were 112 patients 
who had definite or classical RA, were seropositive for 
rheumatoid factor and were receiving either stable 
doses of single nonsteroidal anti-inflammatory drugs 
(NSAIDs) for >3 months or single disease-modifying 
anti-rheumatic drugs (DMARDs) with or without 
NSAIDs for >6 months. Patients who were severely dis- 


abled, had taken systemic steroids in the previous 6 
months or had withdrawn from DMARD therapy in the 
previous 12 months were excluded. Two series of medi- 
cines were used. One comprised 42 homeopathic medi- 
cines used for treating RA in 6cH (10°!) and/or 30cH 
(10°) dilutions (a total of 59 preparations) manufac- 
tured to French National Pharmacopoeia standards, the 
other comprised identical matching placebos. The 
main outcome measures were visual analogue scale 
pain scores, Ritchie articular index, duration of morning 
stiffness and erythrocyte sedimentation rate (ESR). 
Results: Fifty-eight patients completed the trial. Over 6 
months there were significant decreases (P < 0.01 by 
Wilcoxon rank sum tests) in their mean pain scores 
(fell 18%), articular indices (fell 24%) and ESRs (fell 
11%). Fifty-four patients withdrew before completing 
the trial. Thirty-one changed conventional medication, 
10 had serious intercurrent illness or surgery, 12 failed 
to attend and three withdrew consent. Placebo and ac- 
tive homeopathy had different effects on pain scores; 
mean pain scores were significantly lower after 3 
months’ placebo therapy than 3 months’ active therapy 
(P = 0.032 by Wilcoxon rank sum test). Articular index, 
ESR and morning stiffness were similar with active and 
placebo homeopathy. 

Conclusions: We found no evidence that active homeop- 
athy improves the symptoms of RA, over 3 months, in 
patients attending a routine clinic who are stabilized 
on NSAIDs or DMARDs. 


Factorial Design Trials 


An RCT using a factorial design allows interventions to be 
evaluated both individually and in combination with one 
another. Therefore, two or more hypotheses can be ex- 
plored in one experiment and the effect of combination 
therapy can be assessed. For example, in an RCT using a 
2 x 2 factorial design, participants are allocated to one of 
four possible combinations: (1) treatment A, (2) treatment 
B, (3) treatment A and B, or (4) no treatment. 

There may be an interaction between interventions, and 
this may have an impact on sample size requirements of 
the study. Interactions are more common when trial inter- 
ventions share similarities in their mechanisms of action, 
and result when the effect of one intervention is influ- 
enced by another intervention. For example, in the case 
of a negative interaction, in which the overall effect of in- 
dividual interventions is reduced when they are provided 
together, the sample size would need to be increased to 
still detect a significant difference (Fig. 11.5). 


Jargon Simplified: Factorial Design 

Factorial design answers two questions and has multiple 
treatment arms. For example, in a 2 x 2 factorial design, 
participants are allocated to one of four possible combi- 
nations: (1) treatment A, (2) treatment B, (3) treatment 
A and B, or (4) no treatment.”° 
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Examples from the Literature: Example of a Factorial 

Design 

Source: Kerkhoffs GM, Struijs PA, de Wit C, Rahlfs VW, 
Zwipp H, van Dijk CN. A double blind, randomized, par- 
allel group study on the efficacy and safety of treating 
acute lateral ankle sprain with oral hydrolytic enzymes. 
Br J Sports Med. 2004;38(4):431-435.72 

Objective: To compare the effectiveness and safety of the 
triple combination Phlogenzym (rutoside, bromelain, 
and trypsin) with double combinations, the single sub- 
stances, and placebo. 

Design: Multinational, multicenter, double-blind, ran- 
domized, parallel group design with eight groups struc 
tured according to a factorial design. 

Setting: Orthopaedic surgery and emergency depart- 
ments in 27 European hospitals. 

Participants: A total of 721 patients aged 16-53 years 
presenting with acute unilateral sprain of the lateral an- 
kle joint. 

Primary efficacy criteria: (a) Pain on walking one or 
two steps, as defined by the patient on a visual analogue 
scale. (b) The range of motion, as measured by the inves- 
tigator and expressed as a sum of flexion and extension. 
(c) The volume of the injured ankle measured with a vol- 
ometer. 

Results: At the primary end point at 7 days, the greatest 
reduction in pain was in the bromelain/trypsin group 
(73.7%). The Phlogenzym group showed a median reduc- 
tion of 60.3%, and the placebo group showed a median 
reduction of 73.3%. The largest increase in range of mo- 
tion (median) was in the placebo group (60% change 
from baseline). The Phlogenzym group showed a median 
increase of 42.9%. The biggest decrease in swelling was 
in the trypsin group (3.9% change from baseline). The 
Phlogenzym group showed a -2.30% change from base- 
line and the placebo group a -2.90% change. In the sub- 
group analysis of patients who did not use a Caligamed 
brace, Phlogenzym was superior to placebo for the sum- 
marizing directional test of the primary efficacy criteria 


(MW = 0.621; LB-CI 0.496; P = 0.029; one sided Wei- 
Lachin procedure). The vast majority of doctors and 
patients rated the tolerability of all treatments tested 
as very good or at least good. 

Conclusions: Phlogenzym was not found to be superior 
to the three two-drug combinations, the three single 
substances, or placebo for treatment of patients with 
acute unilateral sprain of the lateral ankle joint. The 
small subgroup of patients treated without the support 
of a Caligamed brace showed evidence of superiority of 
Phlogenzym over placebo. Further research is warranted 
to study this effect of Phlogenzym in patients treated 
without ankle support. 


n-of-1 Trials 


A criticism of the other RCT designs is that they may 
provide good information on treatment outcome for the 
average patient, but are poorly equipped to provide indivi- 
dual-specific information. Randomized controlled trials in 
individuals are possible and require limited resources. 
Such n-of-1 studies are conducted by systematically vary- 
ing the management of a patient’s illness during a series of 
treatment periods (alternating between experimental and 
control interventions) to confirm the effectiveness of treat- 
ment in the individual patient.” The number of pairs of 
interventions often varies from two to seven, but this num- 
ber is not specified in advance and the clinician and patient 
can decide to stop when it becomes clear that there are, or 
are not, important differences between interventions 
(Fig. 11.6). 


Jargon Simplified: n-of-1 Trials 

“n of 1’ trials are conducted by systematically varying 
the management of a patient’s illness during a series of 
treatment periods (alternating between experimental 
and control interventions) to confirm the effectiveness 
of treatment in the individual patient.” 7° 
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Fig. 11.6 n-of-1 Trial design. 
(From Busse JW, Bhandari M, 
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Examples from the Literature: Example of an n-of-1 
Design 

Source: Woodfield R, Goodyear-Smith F, Arroll B. N-of-1 
trials of quinine efficacy in skeletal muscle cramps of the 
leg. Br J Gen Pract. 2005;55(512):181-185.7? 
Background: Skeletal muscle cramps affect over a third 
of the ambulatory elderly population. Quinine is the es- 
tablished treatment, but there are safety concerns, and 
evidence for efficacy is conflicting. A recent meta-analy- 
sis established a small advantage for quinine, but identi- 
fied the need for additional studies. N-of-1 trials com- 
pare two treatments, in a randomized, double-blind, 
multiple crossover study on a patient-by-patient basis. 
They have been used to compare treatments in osteoar- 
thritis and may be suitable for determining the indivi- 
dual efficacy of quinine. 

Aim: To establish efficacy and safety of quinine sulfate 
use for the treatment of leg-muscle cramp. 

Design of Study: Double-blind, randomized series of n- 
of-1 controlled trials of quinine versus placebo for mus- 
cle cramps. 

Setting: New Zealand general practices. 

Method: The participants were 13 general practice pa- 
tients (six males; seven females; median age = 75 years) 
already prescribed quinine. Following a 2-week wash- 
out, each patient received three 4-week treatment 
blocks of quinine sulfate and matched placebo capsules 
with an individual, randomized crossover design. The 
main outcome measures were: patient diaries of cramp 
occurrence, duration and severity; capsule counts; and 
blood quinine levels in the final treatment block. 
Results: Ten patients completed the trial. Three patients 
were identified for whom quinine was clearly beneficial 
(P < 0.05), six showed nonsignificant benefit and one 
showed no benefit. All patients elected to continue qui- 
nine post-study. 


Conclusion: Series of n-of-1 studies differentiated pa- 
tients whom quinine had statistically significant effects; 
those with trend toward effectiveness; those for whom 
quinine was probably not effective. Ideally n-of-1 trial 
should be performed when a patient is commenced on 
quinine. More cycles in n-of-1 studies of quinine may ad- 
dress issues of statistical power. 


Critical Appraisal of the Surgical Literature 


As featured in the Surgical Users’ Guides, checklists have 
been developed on how to effectively critically appraise 
and evaluate a randomized controlled trial. 


Examples from the Literature: Checklist for Critical 
Appraisal of a Randomized Controlled Trial 
“Validity 
Did experimental and control groups begin the study 
with a similar prognosis? 
Were patients randomized? 
Was randomization concealed? 


Were patients analyzed in the groups to which they 
were randomized? 


Were patients in the treatment and control groups 
similar with respect to known prognostic factors? 


Did experimental and control groups retain a similar 
prognosis after the study started? 


Blinding 
Did investigators avoid effects of patient aware- 
ness of allocation - were patients blinded? 


Were aspects of care that affect prognosis similar 
in the two groups - were clinicians blinded? 


Was outcome assessed in a uniform way in experi- 
mental and control groups — were those assessing 
outcome blinded? 


Was follow-up complete? 


Results 
How large was the treatment effect? 
How precise was the estimate of the treatment effect? 


Applicability 
Can the results be applied to my patient? 
Were all patient-important outcomes considered? 


Are the likely treatment benefits worth the potential 
harm and costs?” 


Source: From Bhandari M, Guyattt GH, Swiontkowski 
MF. Users’ guide to the orthopaedic literature: how to 
use an article about a surgical therapy. J Bone Joint 
Surg Am 2001; 83A:917. Reprinted by permission. 


www.urdukutabkhanapk.blogspot.com 


Suggested Reading 


Bhandari M, Guyatt GH, Swiontowski MF. Users’ guide to the orthopaedic 


literature: how to use an article about a surgical therapy. J Bone Joint 
Surg Am 2001; 83:916-926 


Devereaux PJ, Bhandari M, Clarke M, et al. Need for expertise based rando- 


mised controlled trials. BMJ 2005; 330:88 


Guyatt GH, Rennie D, eds. Users’ Guide to the Medical Literature: A Man- 


ual for Evidence-Based Clinical Practice. Chicago, IL: AMA Press; 2002 


Jadad A. Randomized Controlled Trials. London: BMJ Books; 1998 


References 


= 


N 


w 


on 


D 


N 


co 


wo 


10. 


. McCulloch P, Taylor I, Sasko M, Lovett B, Griffin D. Randomized trials 


in surgery: problems and possible solutions. BMJ 2002; 324:1448- 
1451 


. Bhandari M, Richards RR, Sprague S, Schemitsch EH. The quality of re- 


porting of randomized trials in the Journal of Bone and Joint Surgery 
from 1988 through 2000. J Bone Joint Surg Am 2002; 84:388-396 


. Stirrat GM, Farndon J, Farrow SC, Dwyler N. The challenge of evaluat- 


ing surgical procedures. Ann R Coll Surg Engl 1992; 74:80-84 
Buchwald H. Surgical procedures and devices should be evaluated in 
the same way as medical therapy. Control Clin Trials 1997; 18:478- 
487 


. Jadad A. Randomized Controlled Trials. London: BMJ Books;1998 


Solomon MJ, McLeod RS. Surgery and the randomized controlled 
trial: past, present, and future. Med J Aust 1998; 169:380-383 


. Solomon MJ, McLeod RS. Should we be performing more randomized 


controlled trials evaluating surgical operations? Surgery 1995; 
118:459-467 


. Guyatt GH, Rennie D, eds. Users’ Guides to the Medical Literature: A 


Manual for Evidence-Based Clinical Practice. Chicago, IL: AMA 
Press;2001 

Bhandari M, Tornetta P III, Guyatt GH. Glossary of evidence-based 
orthopaedic terminology. Clin Orthop Relat Res 2003; 413:158-163 
Bhandari M, Guyatt GH, Swiontowski MF. Users’ guide to the ortho- 
paedic literature: how to use an article about a surgical therapy. J 
Bone Joint Surg Am 2001; 83:916-926 


12; 


1 


w 


14. 


15. 


1 


1 


N 


1 


co 


19. 


2 


2 


ar 


22. 


2 


w 


D 


2 


11 The Randomized Trial 


. Altman DG, Schulz KF, Moher D, et al. CONSORT GROUP. (Consolidated 


Standards of Reporting Tirals). The revised CONSORTstatement for re- 
porting randomized trials: explanation and elaboration. Ann Intern 
Med 2001; 134(8):663-694 

Bhandari M, Guyatt G, Lochner H, Sprague S, Tornetta P. Application of 
the consolidated standards of reporting trials (CONSORT) to the frac- 
ture care literature. J Bone Joint Surg Am 2002; 84:485-489 


. Moseley JB, O'Malley K, Petersen NJ, et al. A controlled trial of arthro- 


scopic surgery for osteoarthritis of the knee. N Engl J Med 2002; 
347:82-88 

Devereaux PJ, Bhandari M, Clarke M, etal. Need for expertise based 
randomised controlled trials. BMJ 2005; 330:88 

Van der Linden W. Pitfalls in randomized surgical trials. Surgery 
1980; 87:258-262 

Kendig RJ. Letter to the Editor “Operative treatment of fractures of the 
tibial plafond. A randomized prospective study.” J Bone Joint Surg Am 
1997; 79:1893-1894 


. Jordhoy MS, Fayers PM, Ahlner-Elmqvist M, Kaasa S. Lack of conceal- 


ment may lead to selection bias in cluster randomized trials of pallia- 
tive care. Palliat Med 2002; 16:43-49 


. Fayers PM, Jordhoy MS, Kaasa S. Cluster-randomized trials. Palliat 


Med 2002; 16:69-70 

Cosby RH, Howard M, Kaczorowski J, Willan A, Sellors J. Randomizing 
patients by family practice: sample size estimation, intracluster cor- 
relation, and data analysis. Fam Pract 2003; 20:77-82 

Busse JW, Bhandari M, Schemitsch E. Randomized Trials in Surgery. 
Tech Orthop 2004;19:77-82 


. Fisher P, Scott DL. A randomized controlled trial of homeopathy in 


rheumatoid arthritis. Rheumatology 2001; 40:1052-1055 

Kerkhoffs GM, Struijs PA, de Wit C, Rahlfs VW, Zwipp H, van Dijk CN. A 
double blind, randomized, parallel group study on the efficacy and 
safety of treating acute lateral ankle sprain with oral hydrolytic en- 
zymes. Br J Sports Med 2004; 38(4):431-435 


. Woodfield R, Goodyear-Smith F, Arroll B. N-of-1 trials of quinine effi- 


cacy in skeletal muscle cramps of the leg. Br J Gen Pract 2005; 55 
(512):181-185 


67 


68 


www.urdukutabkhanapk.blogspot.com 


12 
Meta-Analysis 


“One bastard goes in, another one comes out.” 


Summary 


In this chapter, key design issues in conducting and re- 
viewing meta-analyses in surgery are presented. Different 
types of reviews are discussed including narrative reviews, 
systematic reviews, and meta-analyses. Only an overview 
is provided here, conducting a systematic review is not a 


Introduction 


Historically, the concept of pooling data resulted from pub- 
lished studies in psychology research. The pooling started 
with “vote counting,” summing up the studies with posi- 
tive results and the ones with negative results. However, 
this method did not take into account the sample size of 
the included studies, which led to more sophisticated ana- 
lysis, utilizing statistical techniques. In surgery, most re- 
views are written on a “this is how I (or we) do it” basis. 
Consequently, published surgical studies often had small 
sample sizes, even smaller than the psychology studies 
that made pooling necessary. To overcome the insufficient 
power, a meta-analysis of pooled data is undertaken. At 
present, three types of reviews are frequently published: 


Narrative Review 


A narrative review usually focuses on a broader topic. For 
example, it may describe femoral fractures. Current con- 
cepts are a form of narrative reviews, which often describe 
how a group of surgeons treats a specific problem, describ- 
ing a wide spectrum of treatment options. The literature is 
summarized, but the methods used to find the literature 
are not systematic and the included studies are not evalu- 
ated on their scientific merits. Textbook chapters are writ- 
ten in this way and narrative reviews can also be helpful to 
answer background questions. Because the literature is not 


— Tuco in The Good, the Bad, and the Ugly 


simple task. Consequently, a clinician not educated in epi- 
demiology or research methodology should consult an 
epidemiologist prior to embarking on a systematic review 
or meta-analysis. 


narrative reviews, systematic reviews, and meta-analyses. 
Each is discussed in this chapter. 


Jargon Simplified: Study Power 

“In a comparison of two interventions, study power is 
the ability to detect a difference between the two ex- 
perimental groups, if one in fact exists.”? 


Key Concepts: Types of Reviews 
e Narrative review 

e Systematic review 

e Meta-analysis 


systematic in this type of review, it might have missed re- 
levant studies, or the authors may have selected studies 
that concurred with their opinion. 


Jargon Simplified: Background Questions 

Background questions focus on the principles of (patho-) 
physiology, and basic anatomy in broader terms to un- 
derstand “a medical principle or problems”; they are 
usually asked by medical students or residents.” 


www.urdukutabkhanapk.blogspot.com 


Systematic Review 


A systematic review is different from a narrative review in 
that it addresses a more specific question and uses a sys- 
tematic approach. It is helpful in answering foreground 
questions. For example, “In a patient with an acute rupture 
of the lateral ankle ligament, is cast immobilization or 
functional taping a better treatment option if the patient 
wants to return to work as soon as possible?” In a systema- 
tic review, a specific patient-oriented question is asked, a 


Meta-Analysis 


A meta-analysis utilizes a similar systematic approach to a 
systematic review as both are addressing a foreground 
question. The majority of systematic reviews starts with 
the aim to pool statistically the results in a meta-analysis, 
but due to insufficient quality of the primary studies or 
heterogeneity, the pooling of data is not possible. There- 
fore, the primary difference between a meta-analysis and 
a systematic review is that a meta-analysis statistically 
pools the results from the primary studies. 


Jargon Simplified: Heterogeneity 
“Heterogeneity refers to differences between patients or 
differences in the results of different studies.” 


Jargon Simplified: Primary Study 

A primary study is a clinical study included in a meta- 
analysis or a systematic review. From this study, data 
are abstracted and used in the final statistical analysis 
to pool the results? 


Jargon Simplified: Pooling of Data 

Pooling of data is typically done in a two-stage process 
for meta-analysis. In the first stage, a summary statistic 
is calculated for each study. For controlled trials, these 
values describe the treatment effects observed in each 
individual trial. In the second stage, a pooled treatment 
effect estimate is calculated as a weighted average of the 
treatment effects estimated in the individual studies.’ 


In the literature, meta-analyses are sometimes referred to 
as a systematic review, although they statistically pool data 
using meta-analytic tools. The pooling of data from inter- 
national trials facilitates generalizability of the results in 
a larger context than the results from a single-center trial. 


Jargon Simplified: Generalizability 

Generalizability is the term used to describe the “ability 
to generalize the findings of a study to a larger group of 
similar people.” 
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systematic literature search is performed, the methodolo- 
gical quality of the included studies is carefully analyzed, 
and the results are summarized. 


Jargon Simplified: Foreground Questions 

Foreground questions, in contrast to background ques- 
tions, focus on clinical situations - how to solve a clinical 
dilemma for an individual patient.? 


Examples from the Literature: Example of a Systematic 
Review 

Source: Kerkhoffs GM, Struijs PA, Marti RK, Blankevoort 
L, Assendelft WJ, van Dijk CN. Functional treatments for 
acute ruptures of the lateral ankle ligament: a systematic 
review. Acta Orthop Scand 2003;74(1):69-77.1 
Abstract: Our aim with this systematic review was to as- 
sess the effectiveness of various functional treatments 
for acute ruptures of the lateral ankle ligament in adults. 
We performed an electronic database search using MED- 
LINE, EMBASE, COCHRANE CONTROLLED TRIAL REGIS- 
TER and CURRENT CONTENTS. We evaluated rando- 
mized clinical trials describing skeletally mature sub- 
jects with an acute rupture of the lateral ankle ligament 
and compared functional treatments for inclusion in this 
study. Nine trials met our inclusion criteria. Two re- 
viewers independently assessed the quality of these 
trials and extracted relevant data on treatment outcome. 
Where appropriate, results of comparable studies were 
pooled. Individual and pooled statistics are reported 
as relative risks (RR) for dichotomous outcome and 
(weighted) mean differences (WMD) for continuous out- 
come measures with 95% confidence intervals (95% CI). 
Heterogeneity between the trials was tested using a 
standard chi-square test. Persistent swelling at short- 
term follow-up was less with lace-up ankle support 
than with semi-rigid ankle support (RR 4.2 95% CI 1.3- 
14), an elastic bandage (RR 5.5; 95% CI 1.7-18) and 
tape (RR 4.1; 95% CI 1.2-14). A semi-rigid ankle support 
required a shorter period for return to work than an 
elastic bandage (WMD 4.2; 95% CI 2.4-6.1) (P = 0.7). 
One trial reported better results for subjective instability 
using the semi-rigid ankle support than the elastic ban- 
dage (RR 8.0; 95% CI 1.0-62). Treatment with tape re- 
sulted in more complications, mostly skin problems, 
than that with an elastic bandage (RR 0.1; 95% CI 0.0- 
0.8). We found no other statistically significant differ- 
ences. We conclude that an elastic bandage is a less effec- 
tive functional treatment. Lace-up supports seem better, 
but the data are insufficient as a basis for definite conclu- 
sions. 
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How to Conduct a Systematic Review or Meta-Analysis 


Like any other research protocol, a systematic review starts 
with a study question. After one has framed the question to 
be answered one can start a literature search. Details on 
how to ask answerable questions and conduct literature 
searches are given in Chapter 27. A guide to how to conduct 
a systematic review is featured below (Table 12.1). 


Table 12.1 Guide to Conducting a Systematic Review 


Define the Question 
Specify inclusion and exclusion criteria: 


Population 

Intervention or exposure 

Outcome 

Methodology 

Establish a priori hypotheses to explain heterogeneity 
Conduct Literature Search 

Decide on information sources: databases, experts, funding 


agencies, pharmaceutical companies, personal files, registries, 
citation lists of retrieved articles 


Determine restrictions: time-frame, unpublished data, language 
Identify titles and abstracts 

Apply Inclusion and Exclusion Criteria 

Apply inclusion and exclusion criteria to titles and abstracts 
Obtain full articles for eligible titles and abstracts 

Apply inclusion and exclusion criteria to full articles 

Select final eligible articles 

Assess agreement between reviewers on study selection 
Abstract Data 


Abstract data on participants, interventions, comparison inter- 
ventions, study design 


Abstract results data 

Assess methodological quality 

Assess agreement between reviewers on validity assessment 
Conduct Analysis 

Determine method for pooling of results 

Pool results (if appropriate) 

Decide on handling missing data 

Explore heterogeneity 


Sensitivity and Subgroup Analysis 
Explore possibility of publication bias 


Source: From Bhandari M, Guyatt GH, Montori V, Devereaux PJ, 
Swiontkowski MF. Users’ guide to the orthopaedic literature: how 
to use a systematic literature review. | Bone Joint Surg Am 2002; 
84:1672-1682. Reprinted by permission. 


Define the Question 


To refine the clinical question to be answered one needs to 
specify the inclusion and exclusion criteria. A detailed de- 
scription on how to phrase answerable questions is pro- 
vided in Chapter 27. To do this, you need to define the 


study population (patients), the intervention or exposure, 
and the outcome. 

Next, one has to decide the quality of the studies to be 
included. Based on the study design and methodology 
one can exclude studies a priori. For example, many sys- 
tematic reviews only include randomized controlled trials 
(RCTs). However, RCTs are not always available for inter- 
ventions under investigation in the review. 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is “an experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.”° 


The reviewers have to rely on lesser forms of evidence like 
case-control studies. Even though RCTs are rated as the 
highest level of evidence, methodological limitations are 
frequently observed in these studies.° Therefore, reviewers 
evaluate the quality of the included (also known as pri- 
mary) studies utilizing study quality ratings scores like 
the Cochrane score or the Detsky score.” Using these 
study quality rating tools is not without hazard.* Each pri- 
mary study is unique, methodological safeguards (like 
blinding) can be used in one study but are impossible to 
use in the other study; thus, the use of summary scores 
should be avoided. For example, if a score rates one study 
low at its randomization method, it can score high at com- 
pleteness of follow-up. The other study can score high at 
the randomization method and low at the follow-up 
achieved. Both studies may score the same total score; 
hence, we are uninformed about the trial’s unique features. 
Another hazard is the use of thresholds when scoring 
study quality. This threshold is often arbitrary and relying 
on it can skew the results of a meta-analysis.® 

Keep the evaluation of the primary studies simple. Four 
factors influence the internal validity of a study: (1) rando- 
mization and allocation concealment to prevent selection 
bias, (2) ensuring that care programs other than the treat- 
ment under investigation are the same to prevent perfor- 
mance bias, (3) independent or blinded outcome assess- 
ment to prevent detection bias, and (4) protocol deviations 
and losses to follow-up should be prevented to limit attri- 
tion bias.° To help detect these types of bias, the Cochrane 
quality assessment tool can be used. Recently, a new qual- 
ity-evaluation tool, a specially designed checklist, has been 
developed to evaluate surgical and other nonpharmaceuti- 
cal trials: The CLEAR NPT (checklist to evaluate a report of 
a nonpharmacological trial).®!° One does not need to be 
blinded when evaluating the methodological soundness 
of primary studies."!? The overview by Katrak’ describes 
critical appraisal tools. 


Meta-analysis including studies with poor methodologi- 
cal quality will never be able to provide the same evidence 
as a meta-analysis including high-quality trials. “Garbage 
in is garbage out” is the term used by critics; unfortunately, 
they are often right. 


Key Concepts: Types of Bias in Primary Studies? 

e “Selection bias: Biased allocation to comparison 
groups 

e Performance bias: Unequal provision of care apart 
from treatment under evaluation 

e Detection bias: Biased assessment of outcome 

e Attrition bias: Biased occurrence and handling of de- 
viations from protocol and loss to follow-up” 


Next, a priori hypotheses need to be established to explain 
heterogeneity. 


Jargon Simplified: A Priori Hypothesis 

An a priori hypothesis is generated before the study or 
experiment, not based upon experimental fact. Thus, 
this hypothesis will be tested in the experiment, and is 
created before the meta-analysis is conducted, not after 
the data of the primary studies were analyzed. 


Heterogeneity in the primary studies can be due to patient- 
specific factors (i.e., younger versus older patients), primary 
study design factors (i.e., RCTs versus case series), methodo- 
logical quality factors (i.e., adequate randomization, con- 
cealed patient allocation, blinded outcome assessment, ver- 
sus studies not utilizing these methodological safeguards), 
and intervention factors (i.e., was the same surgical techni- 
que utilized in all studies). Reviewers need to decide a priori 
how to deal with possible heterogeneity; if many sources of 
heterogeneity are identified the studies cannot be pooled in 
a meta-analysis. The best way to identify heterogeneity is 
by common sense and clinical expertise." The key issue 
is “Can we compare apples to oranges?” Clinical heteroge- 
neity can be obvious, for example, differences in the study 
population between primary studies.'* A primary study 
may have included American Society of Anesthesiologists 
(ASA) class 1 patients who fit the study criteria better com- 
pared with a study that included patients suffering more 
comorbidities like ASA 2 and 3 patients. The studies thus 
included different patient populations and are clinically 
heterogeneous. Other studies may have allowed inclusion 
of patients with previous failed surgery, whereas others 
only included patients undergoing primary surgery.'4 
Clearly, these studies are clinically heterogeneous and pool- 
ing of the results may skew the reality. 


Key Concepts: Heterogeneity” 

e Clinical heterogeneity: Differences between studies in 
patient population (old-young, ill-mildly ill, gender, 
patients who had previous surgery) or treatment un- 
der investigation (duration of treatment, intensity, do- 
sage, surgical technique), different countries in which 
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the studies were conducted, different outcome defini- 
tions, and so on 

e Statistical heterogeneity: More variation among pri- 
mary studies expected than by chance alone 

e First, you evaluate possible sources of clinical heteroge- 
neity, then you check with a statistical test to confirm 
your ideas were correct. 


Statistical heterogeneity can be revealed using statistical 
tests when you are pooling the data. If these tests reveal 
heterogeneity you can decide not to pool the data. In addi- 
tion, you need to think a priori about planning a subgroup 
analysis to explain heterogeneity. Clinical expertise is very 
helpful in determining sources of clinical heterogeneity; 
this is an argument to have both clinicians and epidemiol- 
ogists cooperating in the conduct of a systematic review or 
meta-analysis. 


Conduct a Literature Search 


After defining your question, the next step is to conduct a 
thorough search of the literature. To do this effectively, it is 
helpful to consult with a librarian; your collaboration with 
this professional in search strategies will prevent missing 
important articles and attaining an abundance of irrele- 
vant articles. 

Next, you have to decide on information sources to be 
searched: databases, experts, funding agencies, pharma- 
ceutical companies, personal files, registries, and citation 
lists of retrieved articles.!®1” Searching only one database, 
such as MEDLINE, will result in a limited number of publi- 
cations, which can bias the result of your meta-analysis.!® 
Consider using restrictions such as timeframe, unpub- 
lished data, or language. If the intervention you are review- 
ing is recent, you may restrict your search to the date later 
than the intervention of a new drug or implant. For exam- 
ple, a literature search on COX2 inhibitors does not need to 
include studies published before 1980 because this type of 
drug has been available only since the 1990s. 

Unpublished data, also known as the gray literature, 
should also be used in your review. Gray literature com- 
prises scientifi¢meeting abstracts, and other data from 
clinical trials not published in a peer-reviewed journal. 


Jargon Simplified: Gray Literature 

The gray literature encompasses scientific and technical 
reports, patent documents, conference papers, internal 
reports, government documents, newsletters, fact 
sheets, and theses, which are not readily available 
through commercial or library channels. The gray litera- 
ture specifically does not include accepted scientific 
journals, books, or popular publications that are avail- 
able through traditional commercial publication chan- 
nels. Often, the only published manuscript of a specific 
trial can be found in the gray literature.'!*° 
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Not including the gray literature may skew the meta-ana- 
lysis results because negative or undecided trial results are 
more difficult to publish, resulting in publication bias.?!? 


Jargon Simplified: Publication Bias 

“Publication bias occurs when the publication of re- 
search depends on the direction of the study results 
and whether they are statistically significant.”® 


Language restrictions are used to simplify the conduct of a 
review because the translation of primary articles can be 
time consuming. However, restricting the search to the 
English language reduces the generalizability of the re- 
views results to non-English-speaking countries. A meta- 
analysis may also be biased if you do not include other lan- 
guages than English.”? 


Apply Inclusion and Exclusion Criteria 


Your literature search will result in relevant and irrelevant 
titles. Hence, the next step will be to limit your results by 
identifying the relevant titles. This is a manual task, going 
over all the titles and selecting the relevant ones. If the title 
does not give enough clues on the article’s relevance, you 
will need to view it as possibly relevant and subsequently 
read the abstract to determine its relevance. 


Jargon Simplified: Primary Study 

A primary study is a clinical study included in a meta- 
analysis or systematic review. From this study, data are 
abstracted and used in the final statistical analysis to 
pool the results. 


If you are still not sure whether to include the study, you 
should read the full text of the article and finalize your de- 
cision. The primary study selection process is complicated 
and ideally performed by at least two reviewers to ensure 
the correct studies are identified and included in the re- 
view. Reviewer agreement can be calculated using kappa 
statistics or a similar approach. The final inclusion of 
studies can be decided based on consensus among the 
reviewers during a consensus meeting. If needed, a third 
reviewer can help to overcome disagreement. Applying in- 
clusion and exclusion criteria to titles and abstracts will 
yield the final list of articles to use for data abstraction. 


Abstract Data 


To abstract data on participants, interventions, compari- 
son interventions, and study design of the primary studies, 
creation of a paper-based or electronically based data ab- 
straction form is advised. Ideally, again two reviewers 
will abstract results data and assess methodological qual- 
ity from the primary studies. Then, agreement between re- 
viewers on validity assessment will be assessed, with dis- 


agreement resolved in a consensus meeting as described 
above. 


Conduct Analysis 


Traditionally, reviews have summed up the results of the 
included primary studies, with the number of positive 
trials and negative trials often described. If the positive 
trials outnumbered the negative trials, the conclusion 
was built on this simple sum. This is referred to as “vote 
counting,” the conclusions of the individual studies guided 
the conclusion of the review. Smaller studies had as much 
influence on the conclusion as had studies with larger 
sample size. To overcome the differences in primary study 
sample size, more sophisticated statistical methods are 
used, taking primary sample size into account. The statis- 
tical utility is a unique feature of a meta-analysis. 


Jargon Simplified: Positive Trials and Negative Trials” 
If the sample size of a trial is sufficient, the following 
results are possible: 

e Positive Trial: The treatment under investigation is 
superior to the control treatment. 

e Noninferior Clinical Trial: A clinical trial that shows 
that a new treatment is equivalent to standard treat- 
ment. It is also called a noninferiority trial. 

e Negative Trial: The treatment under investigation is 
inferior to the control treatment. 


If the primary studies do not show clinical heterogeneity, 
as discussed before, we can decide to pool the results. 
Next, we need to determine the method for pooling results 
from the primary studies. Specific software is available for 
pooling results. Developed by The Cochrane Collaboration, 
Review Manager (RevMan; Version 4.2 for Windows) is 
freely available from The Nordic Cochrane Centre, Copen- 
hagen (www.cc-ims.net/RevMans/download). This pro- 
gram can be used to calculate the magnitude of treatment 
effect. For continuous data, it will calculate the treatment 
effect as the standardized mean difference (SMD). The 
SMD is the difference in means divided by the standard de- 
viation (SD). This SD is the pooled standard deviation of 
participants' outcomes across the whole trial. The SMD va- 
lue does not depend on the measurement scale - an impor- 
tant property.” Thus, the SMD will convert all outcomes to 
a common scale, measured in units of SDs. For dichoto- 
mous data you can calculate the odds ratios (OR) or relative 
risks (RR) to describe the magnitude of the treatment ef- 
fect. Next, you can choose the fixed effect inverse variance 
model for the SMD if the statistical tests for heterogeneity 
are not significant. For the same values for the OR, we used 
fixed effect assumption using the Mantel-Haenszel risk ra- 
tio. If the results of the tests for heterogeneity do show sig- 
nificance, you can choose to refrain from pooling the data, 
or you may choose a random effect model. Some authors 
suggest always using a random effects model because the 


Reality check: Forest Plot 
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Review: Antibiotic prophylaxis for surgery for proximal femoral and other closed long bone fractures 
Comparison: 01 ANTIBIOTIC AGENT (MULTIPLE DOSE) VERSUS PLACEBO OR NO TREATMENT 


Outcome: 01 Deep wound infection 











Study Treatment Control Relative Risk (Random) Weight Relative Risk (Random) 
n/N n/N 95%Cl (%) 95%Cl 

01 hip fracture fixation 
Bodoky 1993 1/124 6/115 e 8.0 0.15 (0.02, 1.26) 
Ericson 1973 0/18 1/21 a 3.6 0.39 (0.02, 8.93) 
Hedstrom 1987 0/56 1/65 a 3.5 0.39 (0.02, 9.29) 
Tengve 1978 1/56 6/71 —_2—_— 8.1 0.21 (0.03, 1.70) 

Subtotal (95% Cl) 254 272 "a 23.1 0.23 (0.07, 0.78) 


Total events: 2(Treatment), 14(Control) 
Test for heterogeneity chi-squared=0.35 df=3 P=0.95 F=0.0% 
Test for overall effect z=2.35 P=0.02 


02 hip endoprosthesis 
Ericson 1973 0/5 1/9 
9 





d 


3.8 0.56 (0.03, 11.57) 









































Subtotal (95% Cl) 5 =e 3.8 0.56 (0.03, 11.57) 
Total events: 0(Treatment), 1 (Control) 
Test for heterogeneity: not applicable 
Test for overall effect z=0.38 P=0.7 
03 unspecified hip fracture procedure 
Boyd 1973 7/135 11/145 41.8 0.68 (0.27, 1.71) 
Buckley 1990 0/121 1/108 3.5 0.30 (0.01, 7.23) 
Burnett 1980 1/135 6/126 — ťa 8.0 0.16 (0.02, 1.27) 
Subtotal (95% Cl) 391 379 << 53.2 0.52 (0.23, 1.17) 
Total events: 8(Treatment), 18 (Control) 
Test for heterogeneity chi-squared=1.77 df=2 P=0.41 F=0.0% 
Test for overall effect z= 1.58 P=0.1 
04 operative management of other or unspecified closed fracture 
Bergmann 1982 0/117 2/63 <> 3.9 0.11 (0.01, 2.23) 
Gatell 1984 2/134 4/150 12.5 0.56 (0.10, 3.01) 
Paiement 1994 0/60 1/62 3.5 0.34 (0.01, 8.29) 
Subtotal (95% Cl) 311 275 — 19.8 0.37 (0.10, 1.42) 
Total events: 2(Treatment), 7 (Control) 
Test for heterogeneity chi-square=0.88 df=2 P=0.64 F=0.0% 
Test for overall effect z= 1.45 P=0.1 
Total (95% Cl) 961 935 < 100 0.40 (0.22, 0.73) 
Total events: 12 (Treatment), 40 (Control) 
Test for heterogeneity chi-squared=4.29 df=10 P=0.93 F=0.0% 
Test for overall effect z=3.00 P=0.003 
T | | | T 
0.01 0.1 1 10 100 


Fig. 12.1 This sample forest plot shows the point estimates of the individual primary studies and the pooled overall effect size described 
as relative risk. (From Gillespie WJ, Walenkamp G. Antibiotic prophylaxis for surgery for proximal femoral and other closed long bone 
fractures. Cochrane Database Syst Rev 2001;CD000244. Reprinted by permission.) 


primary studies can be perceived as a random sample ofall 
studies available at that time. Data are usually presented in 
figures with 95% confidence intervals (CIs) called forest 
plots (Fig. 12.1). 

Often the results in the primary studies are presented in 
a noncomprehensive manner, making abstracting of the 
data cumbersome. Therefore, many reviewers have to con- 
tact the authors of the primary studies to get additional in- 
formation to be able to pool the results. This in itself can be 
a time-consuming practice and is not always successful. 


I Examples from the Literature: Forest Plot 


The forest plot (Fig. 12.1) shows the point estimates of the 
individual primary studies and the pooled overall effect 
size described as RR. 

You will have to explore heterogeneity - clinical sources 
of interstudy differences before you pool the results, and 
statistical tests to evaluate heterogeneity while pooling 
the studies - statistical heterogeneity. Using RevMan, sta- 
tistical tests are calculated automatically in the P, describ- 
ing the percentage of total variation across studies that is 
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due to heterogeneity rather than chance.” The value of 0% 
for P indicates no observed heterogeneity. As the value of /? 
increases, the heterogeneity between studies becomes 
more evident. 

Subgroup analysis may be used to identify a priori deter- 
mined sources of clinical heterogeneity. Primary studies 
may have used seemingly small different surgical techni- 
ques. For example, if the P for the total of primary studies 
is 60%, heterogeneity is very likely. If by using clinically re- 
levant subgroups (more comparable technique in each 
group), the P? decreases to 15% for these subgroups, it is 
likely that the heterogeneity is explained by the subgroups 
where the surgical techniques were more comparable. Ifthe 
P remains high, you should refrain from pooling the data. 


Key Concepts: Statistical Test for Heterogeneity 

F is the percentage of total variation across studies due 
to heterogeneity rather than chance. An F value of 0% 
shows no observed heterogeneity, and larger values in- 
dicate increasing heterogeneity. I ranges from 0 (no het- 
erogeneity) to 100%.” 


Evaluating a Published Systematic Review 


Previously published Users’ Guides are available to help cri- 
tically appraise and evaluate systematic reviews.?””? 
Although designed to help in preparing a manuscript, the 
QUOROM statement can also be used to evaluate the quality 
of a published systematic review. A validated instrument 
that evaluates the scientific quality of a systematic review 
is the Oxman and Guyatt Index (Table 12.2).* This index is 
used by others to evaluate the quality of published systema- 
tic reviews in orthopaedic surgery (Table 12.1).*° 

Finally, you can be confronted with overlapping sys- 
tematic reviews, which are systematic reviews with the 
same clinical question conducted by a different group of 
authors. To complicate matters, these reviews can have 
conflicting results. Jadad et al*° proposed an algorithm to 
help readers analyze conflicting same-topic reviews and 
choose the most appropriate review (Fig. 12.2). One step 
of the Jadad algorithm is to evaluate the methodological 
quality of the review; here the Oxman and Guyatt Index 
can help to choose the most sound review. 


Sensitivity and Subgroup Analysis 


Sensitivity analysis or subgroup analysis may help to ex- 
plore differences in effect size based on the differences in 
methodological quality of the included primary studies 
as well as differences among the primary studies like treat- 
ment duration, differences in rehabilitation protocols, 
chronic or acute patients, and so on. These subgroup ana- 
lyses should be planned a priori; do not embark on data 
dredging after the analysis is done. 

Finally, the possibility of publication bias should be ex- 
plored. RevMan has a feature to do this graphically using 
funnel plots.?°?” However, interpreting funnel plots is not 
as easy as it seems.” Other tests have recently been de- 
scribed to examine potential publication bias.”° 


Preparing the Manuscript 


Many scientific journals request authors to report their 
systematic review in a structured format. The QUOROM 
statement was developed to help authors report their find- 
ings in a standardized structured format to improve the 
quality of reporting. This facilitates the reader’s interpreta- 
tion of the review’s results.” The QUOROM statement can 
be found online at http://www.consort-statement.org/ 
QUOROM.pdf.! 


Table 12.2 Oxman and Guyatt Index 


1. Were the search methods used to find evidence stated? 
2. Was the search for evidence reasonably comprehensive? 


3. Were the criteria for deciding which studies to include in the 
overview reported? 


4. Was bias in the selection of studies avoided? 


5. Were the criteria used for assessing the validity of the included 
studies reported? 


6. Was the validity of all studies referred to in the text assessed 
using appropriate criteria? 

7. Were the methods used to combine the findings of the rele- 
vant studies reported? 


8. Were the findings of the relevant studies combined appropri- 
ately relative to the primary question the overview addresses? 


9. Were the conclusions made by the author(s) supported by the 
data and/or analysis reported in the overview? 


Question 10 summarizes the previous ones and, specifically, asks 
to rate the scientific quality of the review from 1 (being exten- 
sively flawed) to 3 (carrying major flaws) to 5 (carrying minor 
flaws) to 7 (minimally flawed). The developers of the index specify 
that if the “partially/can’t tell” answer is used one or more times in 
questions 2, 4, 6, or 8, a review is likely to have minor flaws at best 
and is difficult to rule out major flaws (i.e., a score <4). If the “no” 
option is used on question 2, 4, 6 or 8, the review is likely to have 
major flaws (i.e., a score $3). 


Source: From Oxman AD, Guyatt GH. Validation of an index of the 
quality of review articles. | Clin Epidemiol 1991;44:1271-1278. 
Reprinted by permission. 
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Fig. 12.2 Jadad decision algorithm for interpretation of 
discordant reviews. (From Jadad AR, Cook DJ, Browman GP. 
A guide to interpreting discordant systematic reviews. 


Conclusions 


Systematic reviews stand in contrast to narrative reviews 
in their focused clinical question and methodical ap- 
proach. Albeit the number of systematic reviews is increas- 
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Economic Analysis 


“To avoid criticism, do nothing, say nothing, be nothing.” 


Summary 


In this chapter, key design issues in designing economic 
analyses in surgery are presented. Types of economic ana- 
lyses are discussed, including cost-minimization analysis, 


Introduction 


With an aging population and the rising costs of health 
care, the importance of cost efficiency is more apparent 
than ever. The development of new surgical interventions 
often promises improved outcomes over previous treat- 
ments, but at a much higher cost. Resources available for 
spending in health care are limited, thereby forcing policy- 
makers to make difficult choices about which programs or 
technologies to fund and how widely these should be avail- 
able to patients. These concerns led to economic evalua- 
tion becoming a tool in health care research to show the re- 
lative value of alternative interventions for improving 
health. When claims are made in the surgical literature 
that a novel surgical technique is superior to the current 
technique, surgeons need to consider the validity of the 
evidence in support of the claim. Consideration should 
be given to the value of the foregone benefits because the 
resource is not available for its best alternative use. To 
make informed decisions, surgeons can use economic ana- 
lyses to decide whether the new surgical procedure should 
be implemented. 

An economic analysis is a set of formal, quantitative 
methods used to compare alternative strategies with re- 
spect to their resource use and their expected outcomes.! 


Types of Economic Analysis 
Cost-Minimization Analysis 


“Cost-minimization analysis (CMA) is a type of cost-effec 
tiveness analysis that is used to compare cost differences 
among competing alternatives when these treatments 
are medically equivalent”? (Fig. 13.1). It is the most accu- 
rate and appropriate method when comparing the cost be- 
tween two therapeutically equivalent drugs or interven- 
tions. Cost-minimization analysis is the most commonly 
reported type of economic analysis. It is also the simplest, 


— Elbert Green Hubbard 


cost-effectiveness analysis, cost-utility analysis, and cost- 
benefit analysis.! In addition, the relevance of incorporat- 
ing economic analyses into surgical trials is emphasized. 


A full economic analysis must consider both the costs 
and the outcomes or consequences. Four types of economic 
analysis are commonly reported in the medical literature. 


Key Concepts: Types of Economic Analyses! 
e Cost-minimization analysis 

e Cost-effectiveness analysis 

e Cost-utility analysis 

e Cost-benefit analysis 


Various types of economic evaluations are reported in the 
literature; however, the basic principle of an economic 
analysis is that choices must be made between alternative 
uses of resources, with consideration of both cost and out- 
come.! Therefore, studies can only be considered as formal 
economic evaluations when the costs and outcomes are 
compared between two or more treatment options. All 
other types of evaluation that fail to satisfy these criteria 
should be designated partial evaluations, thus preventing 
answers to efficiency questions based on the study re- 
sults.! In this chapter, the four common types of economic 
analysis are described and a summary of key items to con- 
sider when conducting an economic analysis is provided. 


as it measures only costs. In other words, a cost-minimiza- 
tion analysis is an economic analysis that is conducted in 
situations where the consequences of the alternatives are 
identical, and so the only issue is their relative costs. 


Jargon Simplified: Cost-Minimization Analysis 
“Cost-minimization analysis is a type of cost-effective- 
ness analysis that is used to compare cost differences 
among competing alternatives when these treatments 
are medically equivalent.” 
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Fig. 13.1 Cost-minimization analysis. 


When conducting a cost-minimization analysis, one begins 
by obtaining acquisition price, then pharmacy, nursing, 
and physician costs associated with the drug or therapy. 
It is also necessary to consider the laboratory costs, the 
cost of supplies, and any other significant factors. Once 
each item is considered and the cost is obtained, the costs 
for each treatment are summed and then compared. Be- 
cause the drug or interventions are therapeutically equiva- 
lent, the one with the lower cost should become the treat- 
ment that is implemented. 


Examples from the Literature: Example of a Cost- 
Minimization Analysis 

Source: Sabate A, Pena MJ, Vila C, Alemany O. Analysis of 
cost minimization of epidural anesthesia compared with 
general anesthesia in oncologic coloproctologic surgery. 
Anales de Medicina Interna 1997;14:291-296. 
Abstract: Combined general and epidural anesthesia in 
abdominal surgery has shown, both, protective and no 
effect on final outcome. The aim of this study was to 
evaluate combined epidural and general anesthesia. 
One hundred and eighty four patients, diagnosed of neo- 
plastic process, in which an elective procedure of colo- 
proctologic resection and reconstruction was scheduled 
during the period between January 1993 and December 
1994, were studied. In thirty consecutive patients com- 
bined general-epidural anesthesia (EA) was performed. 
These patients were compared with thirty general an- 
esthesia patients (GA), selected randomly from the 
same period. Both groups were comparable for demo- 
graphic characteristics and for the type and duration of 
the surgical procedure. Red Blood Cells units transfused 
were 1.7 + 3 in the EA group and 1.4 + 1.9 in the GA 
group. After the operation, most of patients went to 
SICU. The length of the hospital stay was 13 + 6 days 
for GA group, while for EA group was 0.13 + 5. The hos- 
pital mortality for all operated patients (N = 184) was 
1.1%, which were directly related to failure of surgical 
anastomosis. The need for mechanical ventilation and 
pulmonary complications were similar in both groups. 
When analyzing costs, EA group represented a value 
(pesetas) of 433,501 + 183,337 for GA group and 
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Fig. 13.2 Costeffectiveness analysis. 


437,735 + 149,572 for EA group. As shown, in the actual 
context, we conclude that the anesthetic technique did 
not have any influence on outcome or on cost. 


Cost-Effectiveness Analysis 


The cost-effectiveness analysis (CEA) ofa surgical interven- 
tion integrates into the analysis both the costs and the ef- 
fectiveness of the comparative procedures or approaches 
to a surgical problem. In a CEA, the consequences or health 
outcomes are not valued instead they are expressed in nat- 
ural units such as cost per life saved, cost per unit of blood 
pressure lowered, or cases successfully treated (Fig. 13.2). 


Jargon Simplified: Cost-Effectiveness Analysis 
“Cost-effectiveness analysis is an economic analysis is 
which the consequences are expressed in natural 
units.”? 


When a comparison is made between two interventions, 
we are interested in knowing the extra benefit gained 
from the extra costs.! This is referred to as the incremental 
cost-effectiveness ratio (ICER). The numerator of the ICER 
is the marginal difference of the mean cost of each inter- 
vention and the denominator is the marginal mean differ- 
ence of the effectiveness. 


Key Concepts: Incremental Cost-Effectiveness Ratio 
(ICER) 
ICER = A cost / A effectiveness 
(mean cost- mean costs) 1 
(mean outcome, - mean outcomes) 





Jargon Simplified: Incremental Cost-Effectiveness 
Ratio 

Incremental cost-effectiveness ratio (ICER) is the ratio of 
change in costs to the change in effects or the incremen- 
tal cost of an intervention divided by the incremental ef- 
fectiveness.! 


When comparing one intervention to another, there are 
nine possible outcomes (Fig. 13.3).! In cell one, the new in- 
tervention will be less expensive and more effective than 
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Fig. 13.3 Possible outcomes in the comparison of incremental 
costs and incremental effectiveness of two interventions. 
(Adapted from Drummond MF, O’Brien BJ, Stoddart GL, Torrance 
GW. Methods for the economic evaluation of health care pro- 
grammes. 2nd ed. New York/Oxford: Oxford University Press; 
1997. Reprinted by permission.) 


the standard treatment. Conversely, in cell two, the new in- 
tervention costs more and is less effective than the tradi- 
tional treatment. Cells three to six all indicate comparative 
cost and effectiveness that provide strong or weak domi- 
nance. Cell eight occurs when the new intervention is 
less effective and less costly and cell nine occurs when 
the two interventions have the same level of effectiveness 
and cost. Cell seven, which represents the most commonly 
encountered situation, occurs when a new intervention is 
both more effective and more costly. 

Another way of looking at the most common outcomes of 
a CEA is with a cost-effectiveness plane (Fig. 13.4). If the 
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new intervention is both less expensive and more effective, 
then this new intervention is dominant and this is referred 
to as a “win-win” situation. It is not necessary to calculate 
an ICER in a win-win situation. Conversely, if the new inter- 
vention is both more expensive and less effective, this is re- 
ferred to as a “lose—lose” situation. There is also no need to 
calculate an ICER in a lose-lose situation.' Typically most 
new interventions fall into the upper right quadrant of 
the cost-effectiveness plane (Fig. 13.4), where they are 
both more effective and more expensive than the current 
standard of care. It is difficult for policy makers to interpret 
the results of CEA when they fall into the upper right quad- 
rant, as it is hard to compare ICER with different study out- 
comes. For example, it is difficult to decide whether we 
should adopt a new surgical procedure that costs 
$50,000 per limb saved compared with a new blood pres- 
sure medication that reduces blood pressure by 10 units ata 
cost of $30,000. This is a disadvantage of using a CEA. 


Examples from the Literature: Example of a 
Cost-Effectiveness Analysis 

Source: Cota AM, Omer AA, Jaipersad AS, Wilson NV. 
Elective versus ruptured abdominal aortic aneurysm re- 
pair: a 1-year cost-effectiveness analysis. Ann Vasc Surg 
2005;19:858-861.° 

Abstract: Abdominal aortic aneurysm (AAA) is a life- 
threatening condition with an overall mortality of 80%. 
It predominantly affects men 65-74 years of age and is 
caused by focal distension of the main blood vessel in 
the abdomen. Most patients go undetected until their 
aneurysm ruptures. Controversy surrounds the most ap- 
propriate form of screening for AAA. Currently, screen- 
ing is only performed selectively in patients with per- 
ipheral vascular disease. Some patients have their AAA 
detected incidentally, while ultrasound examination of 
the abdomen is performed for other indications. These 
patients have the opportunity to undergo surveillance 
or elective surgery. The mortality rate of emergency sur- 
gical intervention following rupture (50%) is far worse in 
comparison to that of patients undergoing planned in- 
tervention under specialist vascular surgeons (5%). De- 
spite improvements in outcomes from elective interven- 
tion for AAA as a result of specialization, the overall mor- 
tality from this condition remains very high (80%) as the 
commonest presentation of an AAA is rupture. Screen- 
ing all men aged 65-74 years is considered too costly 
in the current economic climate. However the cost dif- 
ference between elective repair and emergency repair 
of AAA must be considered given that the outcome 
from elective AAA repair is far superior to that following 
ruptured AAA repair. Our objective was to retrospec- 
tively collect costs and outcomes of elective and emer- 
gency AAA repair to carry out a cost-effectiveness analy- 
sis. Four multi-professional teams in accident and emer- 
gency, operation theaters, intensive care, and surgical 
wards at the Kent and Canterbury Hospital were se- 
lected from health-care professionals including doctors, 
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managers, nurses, and clerical staff with the purpose of 
obtaining costs. Detailed cost data collection sheets were 
prepared to calculate costs, which included staff costs, 
consumables including drugs, intravenous fluids, equip- 
ment, investigations, laundry, catering, and stationery. 
An inventory of costs per item was obtained, and the to- 
tal cost was calculated from the number of items used. 
Outcomes were measured in terms of survival. The total 
costs of emergency AAA repair were pounds sterling 
96,700.69, with a cost per life saved of pounds sterling 
24,175.17. The total cost of elective AAA repair was 
pounds sterling 76,583.22, with a cost per life saved of 
pounds sterling 5,470.23. Emergency intervention for 
AAA was found to cost five times more than a planned 
intervention per life saved per year. 


Cost-Utility Analysis 


In a cost-utility analysis (CUA), the effectiveness of a surgi- 
cal procedure can be measured in terms of lives saved, 
limbs saved, or days off work averted. Such measurement 
outcomes, however, do not allow one to easily compare 
the benefits across different types of medical interven- 
tions, for example, coronary bypass versus limb transplan- 
tation.® Cost-utility analysis is able to incorporate the in- 
crease in the health-related quality of life (Fig. 13.5). 
Third-party payers and policy makers would prefer this 
type of presentation of effectiveness as they need to decide 
where to allocate scarce health care resources.® 

A cost utility analysis is a type of cost-effectiveness ana- 
lysis in which the consequences are expressed in terms of 
life-years adjusted by peoples’ preferences; typically, one 
considers the incremental cost per incremental gain in 
quality-adjusted life-years.’ 


Jargon Simplified: Cost-Utility Analysis 

“A cost-utility analysis is a type of cost-effectiveness ana- 
lysis in which the consequences are expressed in terms 
of life-years adjusted by peoples’ preferences; typically, 
one considers the incremental cost per incremental 
gain in quality-adjusted life-years.”” 
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Fig. 13.5 Cost-utility analysis. 


Cost-utility analysis addresses these limitations by using a 
common outcome measure (a metric) to account for the 
broad range of relevant outcomes.’ In particular, the con- 
sequences of the intervention are expressed as utilities, 
which can be viewed as the preferences individuals or so- 
ciety may have for a particular health state referred to as 
utilities.“ Utility is usually expressed as a decimal from 
zero to one, with zero representing death and one repre- 
senting perfect health. There are different approaches to 
utility measurement. Utilities (preferences) are global 
health-related quality of life (HRQL) measures. They can 
be obtained from a visual analog scale such as the “feeling 
thermometer,” standard gamble, time-tradeoff, or from 
generic scales such as the Health Utilities Index (HUI) or 
the EuroQol-5D (EQ-5D). These methods are described in 
detail by Drummond et al.” 

One of the most commonly used utility measures in CUA 
is quality-adjusted life-years (QALYs) with results of the 
analysis expressed as cost per QALY gained. Briefly, QALYs 
are calculated by multiplying the life years gained from an 
intervention by the utility weight that can be determined 
by various methods.' This common outcome measure al- 
lows for incorporation of both changes in quantity of life 
and quality of life. 


Jargon Simplified: Quality-Adjusted Life-Years 
Quality-adjusted life-years (QALYs) are a unit of measure 
of survival that accounts for the effects of suboptimal 
health status and the resulting limitations in quality of 
life. 


In many areas of clinical research, CUA is considered the 
most relevant form of economic evaluation as it allows 
for valuation of quality of life and patients’ preferences in 
determination of cost-efficiency. In particular, CUA should 
be the preferred method of economic analysis whenever 
the quality of life is an important outcome or when one 
needs to compare alternatives that have different kinds 
of outcome (i.e., mortality versus improve patient’s func 
tion). By using a common outcome metric, it allows com- 
parison of surgical and medical interventions, thus provid- 
ing public health-care policy decision makers with a basis 
as to where to allocate scarce health care resources. 

Similar to a CEA, in a CUA an incremental cost utility 
ratio is calculated. 


Key Concepts: Incremental Cost-Utility Ratio (ICUR) 
ICUR = A Cost / A QALYs 

_ (mean costa- mean costg) 

7 (mean QALY, - mean QALYp) 





When determining whether to accept a new interven- 
tion following a CUA, the same principles apply as in 
the CEA (Fig. 13.3, Fig. 13.4). When a new intervention 
is both more effective and most costly guidelines exist 
to recommend whether to adopt or reject the new inter- 
vention. Interventions that cost less than $20,000 / QALY 
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gained provides strong evidence to adopt the new tech- 
nique, a cost between $20,000 and $100,000 / QALY 
gained provides moderate evidence to adopt a new in- 
tervention, and those interventions costing more than 
$100,000 / QALY gained should not be implemented.® 


Examples from the Literature: An Example of a Cost- 
Utility Analysis 

Source: Medical Research Council Laparoscopic Groin 
Hernia Trial Group. Cost-utility analysis of open versus 
laparoscopic groin hernia repair: results from a multi- 
center randomized clinical trial. Brit J Surg 2001; 
88:623-661.° 

Abstract: This study was a pragmatic economic evalua- 
tion performed alongside a multicentre randomized 
controlled trial comparing laparoscopic with open groin 
hernia repair. The primary economic evaluation frame- 
work employed was a cost-utility analysis. At 26 hospi- 
tals in the UK and Ireland, 928 patients with a groin her- 
nia were assigned randomly to laparoscopic or open re- 
pair. Cost data were identified and measured both 
within and without the trial. Cost data were combined 
with quality-adjusted life-years (QALYs) from the 
EQ-5D questionnaire to obtain cost-per-QALY ratios. 
The mean cost of laparoscopic hernia repair was 
pound1112.64, compared with pound788.79 for the 
open operation. The extra cost of pound323.85 in the la- 
paroscopic group was mainly due to additional theater 
time and increased equipment and sterilization costs. 
The estimated incremental cost per QALY of the laparo- 
scopic over the open method was pound55 548.00 (95 
per cent confidence interval pound47 216.00- pound63 
885.00). While the results show that a high cost was in- 
curred to produce an additional QALY by using laparo- 
scopic over open hernia repair, sensitivity analyses 
show that there are specific situations in which laparo- 
scopic repair may be a viable alternative, such as when 
reusable equipment is employed. 


Cost-Benefit Analysis 


The cost-benefit analysis (CBA) is a form of economic ana- 
lysis in which the costs and the consequences (including 
increases in the length and quality of life) are expressed 


Conducting an Economic Analysis 


The ideal economic analysis would consist of a large multi- 
center randomized controlled trial, which recruits a few 
thousand patients (based on a formal sample size calcula- 
tion) comparing two interventions. Costs would then be 
collected prospectively along with the sampled data. 
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in monetary terms.” This means that all benefits and costs 
of each intervention are measured in terms of their equiva- 
lent money value. A technique referred to as willingness to 
pay is used to assign the monetary value to the conse- 
quence of each outcome. 

An advantage of the CBA is that it permits a direct com- 
parison of various programs, as both costs and conse- 
quences are reported in the same units (dollars). The 
main criticism of the CBA in health interventions is that 
it may show bias toward the rich, if their willingness to 
pay were higher than that of the poor,’ For additional infor- 
mation on CBA, see Drummond et al.” 


Jargon Simplified: Cost-Benefit Analysis 

“Cost-benefit analysis is a form of economic analysis in 
which the costs and the consequences (including in- 
creases in the length and quality of life) are expressed 
in monetary terms.” 


Examples from the Literature: Example of a 
Cost-Benefit Analysis 

Source: Wang G, Macera CA, Scudder-Soucie B, Schmid T, 
Pratt M, Buchner D. A cost-benefit analysis of physical ac- 
tivity using bike/pedestrian trails. Health Promot Pract 
2005;6:174-179."° 

Abstract: From a public health perspective, a cost-bene- 
fit analysis of using bike/pedestrian trails in Lincoln, Ne- 
braska, to reduce health care costs associated with inac- 
tivity was conducted. Data was obtained from the city's 
1998 Recreational Trails Census Report and the litera- 
ture. Per capita annual cost of using the trails was 
209.28 U.S. dollars (59.28 U.S. dollars construction and 
maintenance, 150 U.S. dollars of equipment and travel). 
Per capita annual direct medical benefit of using the 
trails was 564.41 U.S. dollars. The cost-benefit ratio 
was 2.94, which means that every 1 U.S. dollar invest- 
ment in trails for physical activity led to 2.94 U.S. dollars 
in direct medical benefit. The sensitivity analyses indi- 
cated the ratios ranged from 1.65 to 13.40. Therefore, 
building trails is cost beneficial from a public health per- 
spective. The most sensitive parameter affecting the 
cost-benefit ratios were equipment and travel costs; 
however, even for the highest cost, every 1 U.S. dollar in- 
vestment in trails resulted in a greater return in direct 
medical benefit. 


When designing and conducting an economic analysis, 
one should consider collaborating with a health economist 
to ensure that the correct methodology is followed. Below 
are several additional items that need to be considered 
when conducting an economic analysis. 
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Perspective 


Several perspectives can be taken in an economic evalua- 
tion: the patient’s, the hospital’s, the primary payer's, 
and society’s. When considering costs and consequences, 
it is important that one is explicit on the perspective taken. 
In addition to the direct health care costs, a surgical proce- 
dure also includes costs related to time off work and re- 
duced productivity. If these costs are not included, the ana- 
lysis will certainly overlook an important financial burden 
to the patient, their family, and society in general. A socie- 
tal perspective would include both direct costs as well as 
productivity costs. 


Collection of Costs 


A necessity in the performance of an economic analysis is 
the accurate identification of costs associated with the in- 
terventions. This demands resource utilization informa- 
tion. The estimate of costs will depend on the chosen per- 
spective. This can be obtained from a generic case report 
form (CRF) or one that has been modified specifically for 
a particular study. Case report forms usually are self-admi- 
nistered by the patients at various time intervals after the 
treatment. Depending upon the perspective chosen, the 
CRFs usually document the patient and caregiver demo- 
graphic, educational, and employment information; use 
of health care resources such as visits to family doctors, oc- 
cupational or physical therapists, surgeons, walk-in clinics, 
emergency departments, pain clinics, and visits by home- 
care; patient expenses related to medications and out-of- 
pocket transportation costs to receive additional medical 
care; and information related to patient and caregiver 
days off work and productivity costs. Case report forms 
can also be completed by surgeons or research staff to 
document in-hospital and medical resource utilization. 
Generally, only types and frequencies of resources are col- 
lected rather than costs. Only resources related to the pa- 
tients’ intervention are collected. Once the resources are 
collected, a cost is assigned to each item. It is important 
to report the physical quantities of the resources con- 
sumed or released separately from their prices because 
the price per quantity of an intervention may vary among 
different locations, including provinces, states, and coun- 
tries. Such separation will enable others to calculate the 
costs for each intervention in their own jurisdictions and 
reach separate conclusions regarding the cost-effective- 
ness of a new intervention. Another difficulty with valuing 
costs is that published charges may differ from actual costs, 
depending on the bargaining power of the health care in- 
stitutions, third party payers, and the profit margin in a 
for-profit health care system.” 


Jargon Simplified: Case Report Form 

The case report form (CRF) is an important tool used in 
clinical trials to capture a required record of data and 
other information for each participant during a clinical 
trial. 


Clinical Effectiveness 


The preferred economic evaluation comparing two surgi- 
cal interventions is one in which economic data are col- 
lected alongside a randomized controlled trial providing 
high internal validity. However, the weakness of this 
method is the generalizability of the results. There may 
be low generalizability because of the strict inclusion cri- 
teria for the trial, so it is important to use a more pragmatic 
design, with less strict inclusion criteria when including an 
economic analysis as part of a randomized trial. Pooling 
the results of several trials in a meta-analysis may increase 
the generalizability because the pooled estimate of effec 
tiveness is derived from a wider spectrum of patients at 
several different clinical settings. 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is “an experiment in 
which individuals are randomly allocated to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then followed to determine 
the effect of the intervention.” "! 


Jargon Simplified: Meta-Analysis 

A meta-analysis is “an overview that incorporates a 
quantitative strategy for combining the results of several 
studies into a single pooled or summary estimate.”!! 


Sensitivity Analysis 


A sensitivity analysis is one method of allowing for uncer- 

tainty in economic evaluations. In general, sensitivity ana- 

lysis involves three steps: 

1. Identify the uncertain parameters for which sensitivity 
analysis is required. 

2. Specify the plausible range over which the uncertain 
parameters are thought to vary. 

3. Calculate study results based on combinations of the 
best guess, most conservative, and least conservative 
estimates.! 


The simplest form of sensitivity analysis is a one-way sen- 
sitivity analysis, where each parameter is varied one at a 
time to determine the impact on the study results. This is 
the most common method of sensitivity analysis. A multi- 
way sensitivity analysis recognizes that more than one 
parameter may be uncertain and each could vary within 
its specified range. Another approach to sensitivity analy- 
sis is using what is referred to as a scenario analysis. Here a 
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series of scenarios is constructed, including a base case 
(best guess) and the most optimistic (best case) and the 
most pessimistic (worst case) scenarios. The analyst may 
also include scenarios that they believe could also apply. 
The fourth type of sensitivity analysis is a threshold analy- 
sis. Here the critical value(s) of a parameter(s) central to 
the decision are identified and varied.' 


Jargon Simplified: Sensitivity Analysis 

Sensitivity analysis is “any test of the stability of the con- 
clusions of a healthcare evaluation over a range of prob- 
ability estimates, value judgments, and assumptions 
about the structure of the decisions to be made. This 
may involve the repeated evaluation of a decision model 
in which one or more of the parameters of interest are 
varied.”!! 


Discounting 


An allowance needs to be made for the differential timing 
of costs and consequences in an economic analysis because 
of the notion of time preference. Even in a world where 
there is not inflation and no bank interest, people prefer 


Critical Appraisal of the Surgical Literature 


Checklists have been developed by the Surgical Users’ 
Guides on how to critically appraise and evaluate a surgical 
economic analysis effectively.'? 


Examples from the Literature: Checklist for the Critical 
Appraisal of a Surgical Economic Analysis 

“Are the results valid? 

e Did the recommendations consider all relevant pa- 
tient groups, management options, and possible out- 
comes? 

Did investigators adopt a sufficiently broad view- 
point? 

Are results reported separately for patients whose 
baseline risk differs? 

Is there a systematic review and summary of evidence 
linking options to outcomes for each relevant ques- 
tion? 

Were costs measured accurately? 

Did investigators consider the timing of costs and con- 
sequences? 
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to receive a benefit earlier and incur a cost later, as it pro- 
vides them with more options.' Therefore, we need to dis- 
count costs. Most analysts recommend rates of 3 and 5%. 


Jargon Simplified: Discounting of Costs 

Discounting is the valuation of costs and consequences 
over time. In cost-effectiveness analyses, it is normally 
acceptable to discount costs and benefits occurring in 
the future to present values.! 


Key Concepts: Key Steps in Conducting a Surgical 

Economic Analysis 

1. Select the appropriate type of economic analysis 

(cost-minimization analysis, cost-effectiveness ana- 

lysis, cost-utility analysis, or cost-benefit analysis) 

Determine the perspective of the study. 

Measure the clinical effectiveness. 

4. Measure the resource utilization and then determine 
costs. 

5. Compare the costs and effectiveness of the two treat- 

ment options. 

Discount costs, if appropriate. 

7. Conduct sensitivity analyses on the results. 


Wh 


D 


What are the results? 

e What were the incremental costs and effects of each 
strategy? 

e Do incremental costs and effects differ between sub- 
groups? 

e How much does allowance for uncertainty change 
the results? 

How can I apply the results to patient care? 

e Are the treatment benefits worth the risks and costs? 

e Can I expect similar costs in my setting?” 


Source: From Guyatt GH, Rennie D, eds. Users’ Guides to 
the Medical Literature: A Manual for Evidence-Based 
Clinical Practice. Chicago, IL: AMA-Press, 2001, p. 627. 
Reprinted by permission. 
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Conclusions 


An economic analysis is an important consideration to 
be included in any research initiative. It is important to 
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14 
The Diagnostic Study 


“It is much more important to know what sort of a patient has a disease than what sort of 


a disease a patient has.” 


Summary 


In this chapter, the key features of a diagnostic study are 
described and a systematic approach to evaluating the va- 
lidity of diagnostic studies is provided. The concepts of 
likelihood ratios, sensitivity (the test property that de- 
scribes the proportion of individuals with the disorder in 


Introduction 


Establishing a diagnosis involves both pattern recognition 
and logical reasoning; thus, both clinical training and ex- 
perience as well as knowledge of the medical literature in- 
form this process. When evaluating a patient’s presenta- 
tion, surgeons will typically call upon their clinical acumen 
to establish possible diagnoses and estimate their relative 
likelihood. Next, new information is acquired to further in- 
form these estimates, rule out some possibilities, and ulti- 
mately assist in identifying the most likely diagnosis. With 
each new finding, the likelihood of a potential diagnosis 
moves from one probability, the pretest probability, to 
the posttest probability. The question is “When has suffi- 
cient information been acquired for a diagnosis and 
when is further information required?” 

Consider a patient who presents with severe shoulder 
pain following a collision in a rugby game. He presents 
with his arm held at the side in a position of slight abduc- 
tion and external rotation and there is loss of the normal 
rounded contour of the deltoid muscle. Physical examina- 
tion reveals no nerve or blood vessel damage and X-rays 
fail to detect any fracture around the joint. In this case, 
the probability of an acute anterior shoulder dislocation 


— William Osler (1849-1919) 


whom the test result is positive), and specificity (the test 
property that describes the proportion of individuals 
without the disorder in whom the test result is negative) 
are discussed.! 


is so high (near 100%) that it is above a threshold where 
no further testing is required (Fig. 14.1). 

Consider another patient, a 25-year-old athlete, who 
presents with lateral rib cage pain after being struck by a 
baseball. An experienced clinician would recognize the 
clinical problem (posttraumatic lateral chest pain), the 
most likely diagnoses (rib contusion or rib fracture), and 
arrange testing to establish the diagnosis (plain films). 
Other diagnoses, such as myocardial infarction, are too un- 
likely to consider further. In short, although not as likely as 
a rib contusion, the possibility of a rib fracture is above a 
threshold for testing, while the probability of a myocardial 
infarction is below the threshold for testing (Fig. 14.1). 

For a disorder with a pretest probability above the treat- 
ment threshold, a confirming test that raises the probabil- 
ity further would not assist diagnostically. At the other ex- 
treme, for a disorder with a pretest probability below the 
test threshold, an exclusionary test that further lowers 
the probability would not help diagnostically. When a clin- 
ician believes that the pretest probability is high enough to 
test for, but not high enough to begin treatment, further 
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Probability below Probability between test 
test threshold; no and treatment threshold; 
testing warranted further testing required 


Probability of Diagnosis 
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Fig. 14.1 Test and treatment 
thresholds in the diagnostic pro- 
cess. (From Bhandari M, Montori 

VM, Swiontkowski MF, Guyatt GH. 

Users’ guide to the surgical lit- 

erature: how to use an article 

about a diagnostic test. | Bone 

Joint Surg Am 2003;85:1133- 

1140. Reprinted by permission.) 
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testing will be diagnostically useful, and will be most valu- 
able if it moves the probability across either threshold. 
What factors determine the treatment threshold? Treat- 
ments that carry a high risk of adverse effects argue for a 
higher threshold. Invasive testing, or tests that are asso- 
ciated with substantial risk, argue for a lower treatment 


The Clinical Scenario 


Consider the following scenario: a 70-year-old woman is 
referred to your practice for assessment of new-onset hip 
pain; she underwent a total hip replacement ~3 years 
ago. Laboratory blood analysis has been done recently; 
although the white blood-cell count is elevated, the results 
for erythrocyte sedimentation rate and Creactive protein 
serum level are inconclusive. You recall a speaker at a re- 
cent conference suggesting that serum interleukin-6 (IL- 
6) may be a good marker of periprosthetic infection and 
decide to explore this further. 


The Search of the Medical Literature 


Your question is, “In patients with a previous total hip 
arthroplasty who are suspected of having an infection, 
what is the utility of an IL-6 serum test in diagnosing infec 
tion?” You have recently learned about the Clinical Queries 
function in PubMed, a quick way to narrow your search to 
identify articles that focus on diagnosis. Therefore, using 
the Clinical Queries search option in PubMed (http:// 
www.ncbi.nlm.nih.gov/entrez/query/static/clinical.html), 

you choose a narrow scope search (specificity option) for 
articles on Diagnosis using the expression “serum interleu- 
kin-6 AND total joint arthroplasty.” This search yields a sin- 
gle article entitled, “Serum Interleukin-6 as a Marker of 
Periprosthetic Infection Following Total Hip and Knee Ar- 
throplasty,” by Di Cesare et al.? A quick review of the ab- 
stract indicates that it will likely provide the information 
that you need. You obtain the article from your local hospi- 
tal library to answer three questions: (1) Are the results of 
the study valid? (2) What are the results? (3) Will the 
results help me in caring for my patients? 


Are the Results of the Study Valid? 


Investigators studying a diagnostic test hope to establish 
the power of that test to differentiate between patients 
who have the target condition and those who do not. A cri- 
tical issue to consider in determining the validity of a study 
on a diagnostic test is how the authors assembled the pa- 
tients and whether they used an appropriate reference 
standard for all patients to determine whether the patients 
did or did not have the target condition. 


threshold. The test threshold is similarly influenced; the 
more serious a missed diagnosis the lower the threshold, 
the greater the risks associated with the test being consid- 
ered the higher the testing threshold. Both clinical acumen 
and the medical literature can inform this decision-mak- 
ing process. 


Examples from the Literature: Checklist for the Critical 
Appraisal of Studies of Diagnostic Tests 


“Are the results valid? 
e Did clinicians face diagnostic uncertainty? 


e Was there a blind comparison with an independent 
gold standard applied similarly to the treatment group 
and the control group? 

e Did the results of the test being evaluated influence 
the decision to perform the reference standard? 

What are the results? 

e What likelihood ratios were associated with the range 
of possible test results? 

How can I apply the results to patient care? 

e Will the reproducibility of the test result and its inter- 
pretation be satisfactory in my clinical setting? 

e Are the results applicable to the patient in my prac 
tice? 

e Will the results change my management strategy? 

e Will patients be better off as a result of the test?” 


Source: From Bhandari M, Montori VM, Swiontkowski 
MF, Guyatt G. Users’ guide to the surgical literature: 
how to use an article about a diagnostic test. J Bone Joint 
Surg Am 2003; 85:1133-1140. Reprinted by permission. 


Did Clinicians Face Diagnostic Uncertainty? 


A diagnostic test is only useful to the extent that it distin- 
guishes between disorders that might otherwise be con- 
fused. Almost any test can differentiate healthy persons 
from severely ill ones and clinicians look to administer 
tests when there is diagnostic uncertainty, that is, when 
the test results for patients with the target condition are si- 
milar to the test results for patients without the target con- 
dition. In the latter group, diagnoses other than the target 
condition are responsible for the similarity of the test re- 
sults between groups. For example, the white blood-cell 
count will almost always be elevated in patients who pre- 
sent with an obvious hip infection. On the other hand, the 
white blood-cell count will almost never be elevated in 
healthy controls. However, the diagnostic utility of a white 
blood-cell count is poor in patients who may have early 
septic arthritis, but who also may have another condition 
that elevates the white blood-cell count, such as a bacterial 
infection or recent trauma. 
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Lijmer et al? in a report on bias in studies of diagnostic 
tests, demonstrated that studies involving patients with 
severe disease and healthy volunteers overestimated diag- 
nostic test performance by a factor of 3 (relative diagnostic 
odds ratio = 3.0; 95% confidence interval [CI] = 2.0-4.5). As 
such, surgeons should review studies on diagnostic tests to 
assure themselves that there was diagnostic uncertainty 
among the included patients. The study by Di Cesare et 
al? reported test results for 58 patients that underwent re- 
operation. All patients were referred for evaluation of com- 
plications post knee or hip implantation; thus, it seems 
that the authors did face diagnostic uncertainty. 


Was There an Independent, Blind Comparison with a 
Reference Standard? 


The accuracy of a diagnostic test is established by compar- 
ing it with the truth - in other words, was the test positive 
among those patients that actually had the target condi- 
tion? Accordingly, surgeons should assure themselves 
that study investigators independently applied both the 
test under investigation and an appropriate reference stan- 
dard (such as biopsy, autopsy, or long-term follow-up) to 
every patient. By independent, we mean that the indivi- 
dual interpreting the reference standard should be una- 
ware of the results of the test and that the individual inter- 
preting the test should be unaware of the results of the re- 
ference standard. To the extent that this blinding is not 
achieved, the investigation is likely to overestimate the di- 
agnostic power of the test. In the study by Lijmer et al? lack 
of blinding resulted in a significant overestimation of the 
test performance (relative diagnostic odds ratio = 1.3; 
95% CI = 1.0-1.9; P < 0.05). 

In the study by Di Cesare et al,? all patients underwent 
measurement of serum IL-6 level and separate testing to 
determine the presence or absence of periprosthetic infec- 
tion. The authors did not describe clearly whether the as- 
sessments were performed in an independent and blinded 
fashion. The investigators established the actual presence 
of periprosthetic infection by “at least ten polymorphonuc- 
lear leukocytes per high-power field on postoperative his- 
tological examination of pathological specimens and the 
presence of bacteria on culture of intraoperative speci- 
mens obtained before the administration of antibiotics.” 


Did the Results of the Test Being Evaluated Influence the 
Decision to Perform the Reference Standard? 


The properties of a diagnostic test will be distorted if the 
results of the test influence the decision to carry out the re- 
ference standard. This situation, called verification bias*? 
or workup bias,°’ applies when, for example, investigators 
only confirm target condition status with the reference 
standard for patients who have a positive test result and 
assume that those who have a negative test result do not 
have the target condition. In practice, this leads to an over- 
estimation of the ability of the test being evaluated to dif- 
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ferentiate between patients who have the target condition 
and those who do not. In the study by Lijmer et al,? the test 
performance was overestimated twofold in studies in 
which different reference standards were used for patients 
who had the target condition and those who did not (rela- 
tive diagnostic odds ratio = 2.2; 95% CI = 1.5-3.3). 

For example, verification bias was a potential problem in 
a study of the value of ventilation-perfusion lung scanning 
in the diagnosis of pulmonary embolism (the PIOPED 
study).® Patients whose ventilation-perfusion scans were 
interpreted as “normal/near normal” and “low probabil- 
ity” were less likely to undergo pulmonary angiography 
than those with more positive ventilation-perfusion scans; 
specifically, 69% of the patients in the former group and 
92% of those in the latter group underwent angiography.® 
This finding is not surprising as clinicians might be reluc- 
tant to subject patients who have a low probability of pul- 
monary embolism to the risks of angiography. In this case, 
however, the investigators dealt successfully with the bias 
by constructing an alternative reference standard for pa- 
tients who did not undergo angiography. They followed 
these untreated patients for 1 year to ensure that they re- 
mained free of evidence of pulmonary embolism during 
this period of time. 

In the study by Di Cesare et al,” all patients underwent 
measurement of serum IL-6 level and testing with the re- 
ference standard to determine the presence or absence of 
periprosthetic infection. Thus, the results of the serum 
IL-6 test did not influence the decision to conduct refer- 
ence standard investigations in these patients. 


What Are the Study Results? 


The starting point for any diagnostic process is to deter- 
mine the probability that the target disease is present in 
a given patient group before the next diagnostic test is per- 
formed. This is referred to as the pretest probability. How 
can surgeons estimate pretest probability? Literature on 
the probability of disease given a certain presentation 
and clinical experience can help surgeons to estimate pret- 
est probability. Other information that can be used to esti- 
mate pretest probability can be found in studies evaluating 
the utility of a diagnostic test. 

Returning to our clinical scenario, we can use the history 
and clinical examination to arrive at a pretest probability 
(the probability of infection before the result of the serum 
IL-6 protein test is obtained). The patient's elevated white 
blood-cell count and new-onset hip pain raises concern 
that she may have a periprosthetic infection. The wound 
is neither erythematous nor warm to the touch. Based on 
this information, you estimate that your patient has a 
30% pretest probability of a periprosthetic infection. The 
next step is to decide how the results of the serum IL-6 
test would affect your estimate of the probability of infec- 
tion. In other words, surgeons should be interested in the 
characteristic of the test that indicates the direction and 
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magnitude of this change in probability. This characteristic 
of the test is termed the likelihood ratio.’ The likelihood ra- 
tio (LR) is the characteristic of the test that links the pretest 
probability to the posttest probability (that is, the prob- 
ability of the target condition after the test results are ob- 
tained). 


What Are the Likelihood Ratios Associated with the Test 
Results? 


Table 14.1 presents results from the study by Di Cesare et 
al.” There were 17 patients who had a proven infection and 
41 patients in whom infection was ruled out. For all pa- 
tients, the serum IL-6 level was classified as positive (>10 
pg/mL) or negative (<10 pg/mL). How likely is a negative 
serum IL-6 test among patients who have a periprosthetic 
infection? Table 14.1 reveals that the serum IL-6 level was 
not in the normal range for any of the patients with an in- 
fection, but was normal in 95% (39 of 41) of patients with- 
out an infection. The ratio of these two proportions (0.00/ 
0.95) is the likelihood ratio for a negative IL-6 test and is 
equal to 0.0. Thus, according to these study results patients 
with a periprosthetic infection will always provide a posi- 
tive serum IL-6 test. And, a positive serum IL-6 test is 20.5 
times more likely to occur in patients with a periprosthetic 
infection than in those without an infection (Table 14.1). 

How can we use the likelihood ratio? The likelihood ratio 
tells us how much the pretest probability increases or de- 
creases. For instance, a likelihood ratio of 1.0 will not 
change the pretest probability, whereas a likelihood ratio 
of >1 will increase it. Likelihood ratios of >10 or <0.1 gen- 
erate large and often conclusive changes in the posttest 
probability, likelihood ratios from >5 to 10 or from 0.1 to 
0.2 generate moderate shifts in posttest probability, likeli- 
hood ratios from >2 to 5 or from >0.2 to 0.5 generate small 
(but sometimes important) changes in probability, and 
likelihood ratios from >1 to 2 or from >0.5 to 1 are unlikely 
to produce clinically important changes to the posttest 
probability 


Jargon Simplified: Likelihood Ratio 

The likelihood ratio (LR) is the likelihood that a given test 
result would be expected in a patient with the target dis- 
order compared with the likelihood that that same result 
would be expected in a patient without the target disor- 
der. 


Having determined the likelihood ratios, how do we use 
them to link the pretest probability to the posttest prob- 
ability? Fagan!° proposed a nomogram for converting pret- 
est probability to posttest probability with use of likeli- 
hood ratios. The clinician obtains the posttest probability 
by placing a straight edge that aligns the pretest probabil- 
ity to the likelihood ratio for the diagnostic test. For our 
sample patient who has a pretest probability of 30% based 
on history and clinical examination and a negative serum 
IL-6 test (LR = 0.0), the posttest probability is zero. If the 
serum IL-6 test had been positive (LR = 20.5), then the 
posttest probability of a periprosthetic infection would 
have increased to 91% (Fig. 14.2). An interactive, Inter- 
net-based nomogram is available through the Oxford Cen- 
tre for Evidence Based Medicine (http://www.cebm.net/ 
nomogram.asp). 


Sensitivity, Specificity, and Predictive Value 


Sensitivity is the property of the test that describes the 
proportion of patients with the disorder in whom the 
test result is positive. Specificity is the property of the 
test that describes the proportion of patients without the 
disorder in whom the test result is negative. To calculate 
sensitivity for our example study, we divide the total num- 
ber of patients who had a proven infection and a positive 
test (true-positives; n = 17) by the total number of patients 
who had a proven infection (true-positives + false nega- 
tives; n = 17). Thus, the sensitivity is 100%. To calculate spe- 
cificity, we divide the total number of patients who had no 
infection and a negative serum IL-6 test (true-negatives; 
n = 39) by the total number of patients who had no infec 


Table 14.1 Likelihood Ratios for a Positive and Negative Serum Interleukin-6 Test* 








Serum Interleukin-6 Test Periprosthetic Infection Total 
Yes No 

Positive (>10 pg/mL) 17 True positive (a) 2 False positive (b) 19 

Negative (>10 pg/mL) 0 False negative (c) 39 True negative (d) 39 

Total 17 41 


* Likelihood ratio (for positive test) (a/[a+c])/(b/[b+d]) = sensitivity/(1-specificity) = (17/17)/(2/41) = 20.5 
Likelihood ratio (for negative test) (c/[a+c])/(d/[b+d]) = (1-sensitivity)/(specificity) = (0/17)/(39/41) = 0.0 


Sensitivity: a/(atc) = 17/17 = 100% 

Specificity: e/(b+d) = 39/41 = 95% 

Positive predicative value: a/(atb) = 17/19 = 89% 
Negative predicative value: d/(c+d) = 39/39 = 100% 
Accuracy: (a+d)/(atb+c+d) = 56/58 = 97% 
Prevalence: (a+c)/(a+b+c+d) = 17/58 = 29% 


Source: From Guyatt G, Rennie D, eds. Users’ Guides to the Medical Literature. A Manual for Evidence-Based Clinical Practice. 


Chicago, IL: AMA Press; 2002. Reprinted by permission. 
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Fig. 14.2 Fagan’s nomogram, an example of a positive serum IL-6 
test. (From http://www.cebm.net/index.aspx?0=1161. Repro- 


duced by permission.) 


tion (true-negatives + false positives; n = 41). Therefore, 
the specificity is 95% (Table 14.1). Tests with high sensitiv- 
ity are useful for ruling out disease, and tests with high spe- 
cificity are useful for ruling in disease. 


Jargon Simplified: Sensitivity 
Sensitivity is the proportion of people with disease who 
have a positive test. 


Jargon Simplified: Specificity 
Specificity is the proportion of people free of a disease 
who have a negative test. 


Will the Study Results Help Me in Caring for 
My Patients? 


Having assessed the validity of the article and performed 
the necessary simple calculations to understand its results, 
you can ask yourself whether these results will help you in 
caring for your patient. 

The value of a diagnostic test often depends on its repro- 
ducibility when applied to patients. If a test requires much 
interpretation or involves the use of laboratory assays, var- 
iation in test results can occur. If a study indicates that a 
test is highly reproducible, two possibilities are likely: 
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either the test is quite simple and easy to apply to patients 
or the investigators involved in the study were highly 
skilled in applying the diagnostic test to the study patients. 
If the latter is true, the diagnostic test may not be useful ina 
setting in which nonskilled interpretation of the test is 
likely to occur. 

Another important issue to consider is the similarity of 
your patient to those in the study. The properties of a diag- 
nostic test can change with different disease severities. For 
instance, the test may not perform as well in a community 
practice, where less-complicated cases will have to be dis- 
tinguished from multiple competing diagnoses. In the 
study by Di Cesare et al,” the patients were assessed in a 
university hospital by a group of surgeons specializing in 
total joint replacement. In that setting, surgeons were 
more likely to encounter patients with more severe or 
complicated disease in whom the diagnostic test (serum 
IL-6 level) was likely to perform better. In that setting, al- 
ternative diagnoses may have already been explored and 
ruled out. Likelihood ratios tend to move away from the va- 
lue of 1 when all patients who have the target disorder 
have severe disease, and they tend to move toward the va- 
lue of 1 when all patients who have the target disorder 
have mild disease.° In general, however, if you practice in 
a similar setting to that presented in a diagnostic study 
and your patient meets the study eligibility criteria, you 
can be confident in applying the results of the study to 
your patient. 

Once you have decided that the results are, in fact, ap- 
plicable to your patient, you must decide whether they 
will change your management of the patient. Before mak- 
ing any decisions, you must have a sense of what probabil- 
ities would confirm or refute the target diagnosis. For ex- 
ample, suppose you are willing to proceed with implant re- 
moval, irrigation, and débridement without further testing 
in patients who have a 280% probability of infection (rea- 
lizing that you will be operating on 20% of patients unne- 
cessarily). Moreover, suppose you are willing to reject 
the diagnosis of infection if the test probability is <10%. 
Ina 70-year-old woman with hip pain and overt signs of in- 
fection (pretest probability, 80%) and a negative serum IL-6 
test, the posttest probability of periprosthetic infection 
would be 0.0% and you would not proceed with further 
testing before abandoning infection as a diagnosis. How- 
ever, in the 70-year-old woman with hip pain (pretest 
probability, 30%) and a positive serum IL-6 test, the postt- 
est probability of infection would be 91% and you would 
not conduct further testing for periprosthetic infection. 

After this exercise, you can establish if your patient will 
be better off having had the test. A test becomes more va- 
luable when it has acceptable risks, the target disorder has 
major consequences if left untreated, and the target disor- 
der can be readily treated if diagnosed. Serum IL-6 testing 
poses minimal risk to the patient and may be extremely va- 
luable for diagnosing infection, periprosthesis infection 
can be successfully treated, and the consequences of un- 
treated infection can be extremely serious. 
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Resolution of the Clinical Scenario 


The patient in the scenario at the beginning of this report 
had a pretest probability of infection of 30%. Her positive 
serum IL-6 test (likelihood ratio, 20.5) increased her prob- 


Conclusions 


In conclusion, when one has found an article of interest on 
a diagnostic test, it is necessary to assess the quality of the 
evidence therein. To the extent that the quality is poor, the 
inferences that are drawn from the study will be wea- 
kened; however, if the quality is acceptable, then surgeons 
must consider if the findings can be generalized to their 
own patient. Finally, it is necessary to weight the benefits 
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15 
The Reliability Study 


‘All truth passes through three stages. First, it is ridiculed. Second, it is violently opposed. 
Third, it is accepted as being self-evident.” 


Summary 


The elusive concept of reliability, a term that is frequently 
encountered in orthopaedic research studies is addressed 
in this chapter. The definition and interpretation of fre- 
quently encountered reliability statistics are discussed. A 


Introduction 


The term “reliability” relates to the amount of error that is 
inherent in any measurement that is taken. Some re- 
searchers use many terms synonymously such as agree- 
ment, reproducibility, association, stability, and precision. 
Several terms have formal definitions that are similar to re- 


Measurement 


Measurement is a common part of clinical and research 
practice. Every day, and perhaps with every patient en- 
counter, surgeons perform or interpret measurements. 
The outcomes being measured may appear subjective, 
such as degree of pain and quality of life, or they may ap- 
pear to be completely objective, such as height and range 
of motion (ROM). However, even the most objective out- 
comes (such as height) are associated with a degree of 
measurement error. Subconsciously, surgeons are aware 
of measurement error and for every measurement taken 
decisions are made about how much error is acceptable 
based on context. 

For example, when measuring ROM of the knee in a clin- 
ical setting, most surgeons are comfortable using subjec- 
tive observations of the patient flexing and extending. If 
a different surgeon were to observe the same patient mov- 
ing the same knee, he or she may estimate the ROM to be 
10 to 20 degrees different from the first surgeon’s esti- 
mate. This would not be a major concern when the issue 
relates to subjective observation. However, in a clinical 
study of two interventions where ROM is an outcome, 
a difference of 10 to 20 degrees between measurements 
is likely to be quite important. Researchers would likely 
not accept simple observation as an acceptable method 
of measurement, and instead employ a goniometer to 
measure ROM more reliably. 


— Arthur Schopenhauer 


practical, step-by-step approach to designing and conduct- 
ing a study to determine the reliability of a test is outlined. 
Finally, tools are provided to critically appraise a published 
reliability study. 


liability, whereas others are poorly defined. None of these 
terms is exactly synonymous, so to avoid confusion it is 
simplest to restrict oneself to the term reliability. To under- 
stand the concept of reliability fully, it is important to un- 
derstand some issues surrounding measurement. 


One factor in determining how much error we are will- 
ing to accept is related to the purpose of the measurement. 
In the first example, the clinician would likely observe the 
ROM and classify the patient into a subjective (and perhaps 
subconscious) classification system, such as “normal range 
of motion,” “somewhat limited range of motion,” or “very 
limited range of motion.” This assessment would then be 
used to guide treatment recommendations. It is unlikely 
that a difference of 10 to 20 degrees would alter the recom- 
mendation; therefore, surgeons would probably be com- 
fortable using this approach. In a research setting, the pur- 
pose of the measurements is to compare patients; there- 
fore, it is more important that the measurement instru- 
ment produces the same results in different settings. 

Another important factor related to measurement error 
is the expected range in the observations we are measur- 
ing. If the goal of the measurement is to detect large differ- 
ences (for example, 90 degrees of flexion/extension), a 
measurement error of 10 to 20 degrees is probably accep- 
table. Alternatively, ifthe goal of the measurement is to de- 
tect smaller differences (of the magnitude of 5 to 10 de- 
grees) an instrument with a measurement error of 10 to 
20 degrees would be useless. The underlying principle 
here is that the error of the measurement should be a rela- 
tively small fraction of the expected range in observations. 
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Jargon Simplified: Measurement Error 
Measurement error refers to the degree of uncertainty 
present in any measurement that is taken. The amount 


Reliability Defined 


The concept of reliability refers to the notion that measure- 
ment error should be minimized. In other words, measure- 
ments at different times, by different observers, in differ- 
ent situations, or by parallel tests should produce similar 
results.! The formal definition of reliability also incorpo- 
rates the principle that the measurement error should be 
small relative to the range of observations expected in 
the population. Statistically, the range of observations in 
the population is expressed by the term “true subject 
variability,” and the reliability is essentially the ratio of 
measurement error to true subject variability (as can be 
seen in the box below, the formal definition is slightly 
more complicated, simply to obtain a result between 0 
and 1). 


Jargon Simplified: Definition of Reliability 
true subject variability 
(true subject variability + measurement error) 





Reliability = 


Several important observations emerge from this defini- 
tion of reliability. To begin, reliability is not a fixed prop- 
erty of an instrument, but rather a property of an instru- 
ment in a specific set of circumstances. For example, ima- 
gine a group of investigators measured the reliability of a 
new instrument for the evaluation of gait in a group con- 
sisting of 50 healthy men and 50 men that had just under- 
gone below-knee amputations. Irrespective of the content 
of the new instrument, we would expect the reliability 
measured to be extremely high because the range of gaits 
expected is very large (half the participants will have very 
good gaits; the other half will have very poor gaits). It 
would be foolish to conclude that such an instrument 
was reliable and conduct a clinical trial comparing two 
strategies for hip fractures using this instrument as an out- 
come measure. The reliability of an instrument is depen- 
dent on the participants, the observers, the timing of the 
measurement, and any other factors that could alter the 
subject variability or the measurement error. 

Another implication of the definition is that a test that 
does not discriminate between subjects will have poor re- 
liability. Even if the measurement error is small, if the true 
subject variability is also small (due to the participants or 
the test), the reliability will be low. Taken to the extreme, 
if the gait in a cohort of patients with mild osteoarthritis 
were evaluated using a simple test of “Able to walk” or 
“Not able to walk,” one would expect all the patients to re- 
ceive a rating of “Able to walk.” Therefore, the true subject 
variability would be 0, and the reliability would also be 0. 


of measurement error is related to the subjects, the tes- 
ters, the instrument being used, and any other factors 
that could interfere with the measurement process. 


The Importance of Reliability 


An instrument must be reliable to be useful in clinical or 
research practice. In fact, when developing or testing an in- 
strument, there is little point in determining the other 
properties of the tool (such as validity, feasibility, or ac- 
ceptability) if the reliability has not been clearly estab- 
lished. To illustrate this point, imagine that a surgeon has 
developed a new classification system to rate the quality 
of reduction following internal fixation of femoral neck 
fractures. The surgeon believes this new instrument will 
predict malunion more accurately than all other tools 
that have been described. She is eager to test her instru- 
ment, and therefore rates the quality of reduction of the 
next 100 patients that undergo internal fixation at her hos- 
pital. Incredibly, she finds that her instrument predicts 
malunion with 100% accuracy! The intrepid researcher co- 
ordinates a large international study to confirm her find- 
ings, but surprisingly finds that her instrument does not 
predict malunion any better than chance alone. What 
has gone wrong? 

Among other problems with her research design, the ea- 
ger researcher has neglected to measure the reliability of 
the new instrument. Depending on the content of her in- 
strument, it is possible (perhaps even likely) that other sur- 
geons applied the components differently. For example, if 
one of the criteria of the instrument is significant valgus 
alignment, some surgeons might interpret this to mean 
malalignment >15 degrees, whereas others might consider 
any degree of valgus alignment to be significant. If indivi- 
duals interpret components of an instrument differently, 
the reliability will likely be poor. This, in turn, places an 
upper limit on the validity of the instrument: it is impossi- 
ble to determine if the instrument measures what it pro- 
posed to when the measurements are not consistent. 
Thus, the first step in determining the properties of an in- 
strument should be documenting the reliability. 


Key Concepts: Properties of a Good Instrument 

e Reliable: The instrument produces similar results 
when used on different occasions or by different ob- 
servers. 

e Valid: The instrument measures what it is intended to 
measure. 

e Feasible: The instrument can be used by the people 
who need it. 

e Acceptable: The instrument is acceptable to the health 
care community. 
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Types of Reliability 


Researchers have described several different types of relia- 
bility that are essentially variations of the same theme. We 
will consider four types of reliability here: internal consis- 
tency, intraobserver, test-retest, and interobserver. 

The internal consistency is a property of an instrument 
with several components. It refers to how well each item 
relates independently to the rest of the items on the scale 
and how they related overall. Put more simply, it is a mea- 
sure of whether responses to different questions make 
sense when they are considered together. For example, a 
group of investigators recently measured the psycho- 
metric properties of the Penn Shoulder Score.” The instru- 
ment includes three items regarding pain: (1) pain at rest, 
(2) pain with normal activities, and (3) pain with strenuous 
activities. If these items were truly measures of the same 
thing (upper limb function), we would expect that patients 
with more pain at rest would also, on average, have more 
pain with normal and strenuous activities. In this study, 
the internal consistency was measured with Cronbach’s 
a and yielded a value of 0.93, which indicates excellent in- 
ternal consistency. 

Intraobserver reliability measures the extent to which 
one rater performing multiple tests on the same subject 
yields similar results under different circumstances. The 
measurement setting may differ in a variety of ways, in- 
cluding location (use of an instrument in hospital versus 
at the patient’s home), time (day versus night), instrument 
(visual inspection versus goniometer), or any other factor 
that may impact the measurement. When all of these fac 
tors are held constant, and the only difference is the time 
that has elapsed between tests, the measurement is re- 
ferred to as test-retest reliability. This is often the first 
property of a test that is measured; if an instrument does 
not produce similar results when used by the same tester 
under exactly the same circumstances, it certainly will 
not function well when used by different testers in a vari- 
ety of settings. 

From a practical point of view, probably the most impor- 
tant property of an instrument is the interobserver relia- 
bility: the extent to which two raters obtain similar scores 
when testing the same subject. For a test to be consistently 
used by different individuals, it must have an acceptable 
level of interobserver reliability. Furthermore, if a test is 
shown to have acceptable interobserver reliability, by defi- 
nition it must also have acceptable intraobserver reliability 
because this measure is incorporated into the interobser- 
ver measurement. In other words, if two different indivi- 
duals provide the same rating of a subject at different 
times, it is fair to assume that one individual would also 
give the same ratings of a subject at different times. Thus, 
in evaluating a new diagnostic test, it is only necessary to 
measure the interobserver reliability. If the interobserver 
reliability is poor, it may be useful to know whether the in- 
traobserver reliability is also poor because this knowledge 
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could assist the researcher in identifying the deficiencies 
in the instrument and making appropriate modifications. 


Key Concepts: Types of Reliability 

e Internal consistency: Items in the instrument measure 
the same thing. 

e Test-retest: The results are the same at different times. 

e Intraobserver: One rater gets the same result twice. 

e Interobserver: Two (or more) raters get the same re- 
sults. 


Measures of Reliability 


Several statistical methods have been developed to mea- 
sure the reliability of an instrument. This section will 
briefly review the most commonly used measures; for a 
more complete description of the statistical techniques 
available, refer to Streiner and Norman! or Ludbrook. 


Proportion Agreement 


The simplest measure of reliability is the proportion, or 
percentage agreement. Although easy to calculate and in- 
terpret, this measurement can be misleading and is not 
an acceptable statistic to describe the reliability of a test 
in isolation. The main problem with this statistic is that it 
does not account for chance. For example, Hayes and Peter- 
son measured the intraobserver reliability of Cyriax’s re- 
sistance testing in subjects with shoulder pain.* One of 
the measurements was the resistance (strong or weak) to 
internal rotation of the shoulder. The results of this mea- 
surement are summarized in Table 15.1. 

To calculate the proportion agreement, one would add 
the values in the two agreement cells (strong/strong and 
weak/weak), and divide by the total number of subjects. 
This would yield: (24+2)/(24+1+1+2) = 0.93, or 93% agree- 
ment, which seems extremely good. 

To determine what would be expected through chance 
alone (i.e., if the rater was guessing completely at random, 
but with the same proportion of subjects given “strong” 
and “weak” ratings), we can create another table with 
the same proportion of responses for each testing session 


Table 15.1 Summary of Intraobserver Results Observed 
for Resistance to Shoulder Internal Rotation 








Test 2 
Strong Weak Total 
Strong 24 1 25 
Test 1 Weak 1 2 3 
Total 25 3 28 


Source: Data from Hayes KW, Peterson CM. Reliability of 
classifications derived from Cyriax's resisted testing in subjects 
with painful shoulders and knees. | Orthop Sports Phys Ther 
2003;33:235-246. 
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Table 15.2 Expected Results if the Tester Was Assigning 
Ratings at Random, but with the Same Ratio of Strong 
to Weak 








Test 2 
Strong Weak Total 
Strong 22 3 25 
Test 1 Weak 3 0 3 
Total 25 3 28 


by multiplying the row total by the column total and divid- 
ing by the total number of subjects (Table 15.2). Calculat- 
ing the raw agreement in this table yields 22/28 = 0.79. 
Therefore, we would expect to see agreement between 
the two ratings 79% of the time, even if the rater was gues- 
sing at random. Clearly the estimate of 93% is misleading. 
The more the distribution of responses is skewed (i.e., the 
higher the ratio of strong: weak or weak:strong), the higher 
the agreement is due to chance alone. Another problem 
with measuring the proportion agreement is that it cannot 
be calculated for continuous data. Fortunately, several sta- 
tistical approaches have been developed to deal with these 
problems. 


Pearson Correlation 


One of the oldest statistics used to measure a relationship 
between two variables is the Pearson correlation, or R. The 
Pearson correlation measures the extent to which the rela- 
tionship between two variables can be described by a 
straight line when the points are plotted. Figure 15.1 is a 
scatter plot of data from two hypothetical studies of the in- 
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Fig. 15.1 Hypothetical plot of results from two studies measuring 
interobserver reliability of knee range of motion. The agreement 
between observers is better in study 1 than study 2, but the 
Pearson correlation is 1.0 in both. 


terobserver reliability of knee ROM. In the first study (solid 
line) the agreement between reviewers is perfect, whereas 
in the second study (dashed line) the agreement is much 
worse - the reviewers actually do not agree on any mea- 
surements. In both cases however, the Pearson correlation 
would equal 1.0, indicating perfect correlation. This is be- 
cause the Pearson R is a measure of the strength of a rela- 
tionship, but does not actually inform us of agreement. 
Therefore, the Pearson correlation is an inappropriate sta- 
tistic to measure agreement in a reliability study. 


Cronbach’s Alpha 


Cronbach’s « is a commonly reported measure of internal 
consistency; other statistics used include split-halves and 
Kuder-Richardson. All of these statistics measure the cor- 
relation between items in an instrument. The differences 
lie in how items are selected for comparison, and whether 
the scales are dichotomous or continuous. 


Kappa and Weighted Kappa 


The kappa coefficient is a statistic that accounts for the ef- 
fect of chance in agreement; it is sometimes referred to as 
the “chance-corrected” agreement. It is calculated by com- 
paring the proportion of responses in the agreement cells 
with the proportion of responses that would be expected 
by chance alone.*® The calculation yields a result between 
-1.0 and 1.0, where 1.0 indicates perfect agreement, 0.0 in- 
dicates no agreement beyond chance, and -1.0 indicates 
perfect disagreement. In the example discussed previously 
(Table 15.1), although the proportion agreement was 0.93, 
the kappa calculated was 0.63, which is substantially 
lower. There is an important disadvantage to using kappa 
in this setting: when the results are extremely maldistrib- 
uted in either direction (i.e., many positive or many nega- 
tive values), the possible agreement above chance becomes 
small, and kappa may underrepresent the degree of agree- 
ment.’ 

Kappa can also be used to calculate agreement if there 
are more than two possible responses (categorical data). 
For example, one study examined the interobserver relia- 
bility in the application of the Garden Index (which in- 
cludes four categories) for femoral neck fractures. Data 
from two observers in this study are summarized in Table 
15.3 (the study actually used six observers - kappa can also 
be calculated with more than two observers, but for sim- 
plicity only two are provided here). The principles of calcu- 
lating kappa with more than two categories are the same: 
the proportion of responses in the agreement cells (all of 
the cells on the diagonal line: 1-1, 2-2, 3-3, 4-4) are com- 
pared with the proportion of responses that would be ex- 
pected in these cells by chance alone. In this case, kappa = 
0.34. However, in this scale the categories are ordered, and 
responses that are close together should surely get more 
credit than responses that are further apart (i.e., 1-2 sug- 
gests better agreement than 1-4). An extension of this ap- 
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Table 15.3 Data of Two Observers from a Study Assessing 
the Interobserver Agreement in the Application of the 
Garden Index 














Observer 2 
1 2 3 4 Total 
1 1 2 0 0 3 
2 1 0 0 0 1 
Observer 1 3 0 4 13 1 18 
4 0 0 7 7 14 
Total 2 6 20 8 36 


Source: From Bjorgul K, Reikeras O. Low interobserver reliability of 
radiographic signs predicting healing disturbance in displaced in- 
tracapsular fracture of the femoral neck. Acta Orthop Scand 
2002;73:307-310. Reprinted by permission of Informa Health- 
care, Informa UK Ltd. 


proach, known as the weighted kappa, gives some credit 
for partial agreement.’ Calculating the weighted kappa in- 
volves assigning weights to each disagreement cell, then 
calculating the agreement correcting for chance and incor- 
porating these weights. There are several different meth- 
ods for assigning the weightings, but by far the most com- 
monly used is called quadratic weighting.'° Using the data 
provided, the weighted kappa calculated is 0.70 - substan- 
tially better than the unweighted estimate and a more ac 
curate measure of the agreement observed. 


Phi 


Phi is a relatively new approach in medical statistics to cal- 
culate “chance-independent” agreement.'' Statistically, 
phi is simply the odds ratio calculated from a 2 x 2 table, 
converted to obtain a value from -1.0 (representing com- 
plete disagreement) to 1.0 (representing complete agree- 
ment). There are several theoretical advantages to using 
phi rather than kappa; most notably phi is resistant to ex- 
treme distributions of results (recall that this is not the 
case with kappa). When the distribution of results is ba- 
lanced (i.e., 50% are positive and 50% are negative), phi is 
equal to kappa. 

Looking at the data from Hayes’ study once more (Table 
15.1), the agreement was calculated using phi = 0.75. Note 
that this value lays between the results calculated using 
kappa (0.63) and the raw proportion (0.93). Given the ex- 
treme distribution here, phi is probably the best represen- 
tation of the agreement. 


Intraclass Correlation Coefficient 


Although kappa and phi provide useful measures of agree- 
ment for dichotomous or categorical data, commonly clin- 
ical measurements yield continuous data, such as height or 
ROM. The most commonly reported statistic for measuring 
the reliability of continuous data are the intraclass correla- 
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tion coefficient, or ICC.'* The statistical basis for calculating 
the ICC is derived from a repeated measures analysis of var- 
iance (ANOVA). This approach has become popular among 
statisticians because it is closest to the formal definition of 
reliability: essentially it measures the proportion of total 
variability that is due to true subject variability.” 

There are several different versions of the ICC, depend- 
ing on what assumptions are made and the underlying re- 
search question.! Unfortunately, the flexibility available 
also makes the calculation difficult for nonstatisticians 
and prone to error. Care must be taken in choosing the ap- 
propriate ICC for a given research context; clinicians with 
limited statistical background should consider seeking 
the assistance of statistical colleagues, or refer to one of 
the sources at the end of this chapter. 

On a positive note, the interpretation of the ICC is quite 
straightforward: like most of the agreement statistics 
that have been discussed, it ranges from -1.0 (indicating 
complete disagreement) to 1.0 (indicating complete agree- 
ment). Furthermore, although it is most commonly used 
for continuous data, it can also be used to calculate the 
agreement of categorical data, in which case it yields ex- 
actly the same result as kappa using quadratic weighting. 


Key Concepts: Common Measures of Reliability 

e Proportion agreement: Easy to calculate and interpret, 
but potentially misleading (especially if the distribu- 
tion of results is uneven): should not be used in isola- 
tion 

e Pearson correlation (R): Measures association, not 
agreement: should not be used in isolation 

e Cronbach’s a: For internal consistency of an instru- 
ment 

e Kappa: Most common measure for dichotomous data 

e Weighted kappa (quadratic weighting): Variation of 
kappa for categorical data 

e Phi: Useful for dichotomous data, especially with 
uneven distribution 

e Intraclass correlation coefficient (ICC): Most common 
measure for continuous data 


Conducting a Reliability Study 


Featured in this section is a general framework with speci- 
fic points to consider when preparing your research pro- 
ject. Of course, every research question is unique, and 
this is not intended to provide a “cookbook” approach to 
planning a study. Regardless of the topic of interest or 
the design of the study, it is important to prepare a proto- 
col that clearly describes each step of the research study 
and potential problems before embarking on the project. 


1. Define the Question 


As with any clinical research project, it is important to de- 
fine all aspects of the study question precisely. Why is the 
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study being proposed? Is this a new instrument that is 
being compared with a gold standard, or a new use for 
an old instrument? Will the instrument be used primarily 
for research or clinical practice? Having a carefully con- 
structed research question will allow you to design the ap- 
propriate study and anticipate potential problems before 
they occur. 


2. Select the Appropriate Participants (Raters) 


As discussed previously, reliability is not a fixed property 
of the instrument, but rather a property of the instrument 
in a specific set of circumstances. Thus, if the results of the 
reliability study are to be of any use outside the study, it is 
important to make the circumstances in the study context 
as similar as possible to the setting in which the instru- 
ment will actually be used. The choice of raters depends 
on two factors: generalizability and feasibility. 

To maximize the generalizability of the study findings, 
the diversity of raters should be as broad as the diversity 
of individuals that will be using the instrument in practice. 
If the instrument is designed for only orthopaedic sur- 
geons to use, then the raters in the study should also be 
surgeons. Similarly, if the tool is intended for physiothera- 
pists or research assistants, then these groups of indivi- 
duals should be incorporated into the rating sessions. 

Another factor to consider when selecting raters is the 
expertise of the individuals. If individuals with varying le- 
vels of expertise will use the tool in practice (as is usually 
the case), then this must be reflected in the study design. 
This may involve recruiting trainees and nonacademic sur- 
geons to participate in the rating process, rather than limit- 
ing participation to specialized surgeons practicing at large 
academic centers. Furthermore, for the results of the study 
to be generalizable to individuals from different institu- 
tions and countries, the study raters should ideally include 
individuals from different locations and backgrounds. 

Of course, all of these considerations must be weighed 
against the feasibility of conducting the study. It is not pos- 
sible to conduct a study that incorporates every specialty, 
level of expertise, and site to which your results may apply. 
However, the inferences that can be made regarding gener- 
alizability will be much more robust if the raters represent 
diverse backgrounds. 


3. Select the Appropriate Subjects 


The principles of selecting the subjects for the study are 
very similar to those discussed for the raters. It is impor- 
tant to include as diverse a group of subjects as you antici- 
pate applying the instrument to in practice. For example, in 
a study to determine the reliability of the Evans classifica- 
tion system for trochanteric fractures, it would be impor- 
tant to include a range of radiographs that represents 
each type of fracture. In addition, it would be ideal to in- 
clude radiographs of varying image quality, which mirror 
those that might be seen in clinical practice. The inclusion 


of subjects that represent a broad range of scores (in this 
case, different severities of fracture) also has statistical im- 
portance. From the definition of reliability, we have de- 
scribed why a test that does not discriminate between sub- 
jects will certainly have a poor reliability (because true 
subject variability will be low). As an extension of this con- 
cept, the reliability of a test can be artificially increased or 
decreased simply by manipulating the variability between 
subjects. For example, if all of the radiographs in the Evans 
classification study were type 2 fractures, the calculated 
reliability would be 0. Similarly, if all of the radiographs 
were clearly on the extremes of the scale (i.e., half type 1 
and half type 5), there would be perfect agreement among 
raters and a reliability of 1. Researchers should not exploit 
this property of reliability to yield their sought results. 
Rather, they should include subjects that represent the 
full spectrum of disease (or reduction, healing, etc.) that 
is normally seen in practice. 


4. Determine the Sample Size 


Sample size calculations for reliability studies can be more 
complex than for other research designs because there are 
two samples under the researcher’s control: the number of 
raters and the number of subjects. Manipulating either of 
these variables will alter the precision of the reliability es- 
timate obtained. There are many possible approaches to 
selecting the sample size for each of these groups; one ap- 
proach is to fix the number of raters based on generaliz- 
ability and feasibility, then estimate the number of sub- 
jects required to achieve the desired precision. 

Having established the importance of including a di- 
verse group of raters to maximize generalizability (point 
2), the number of raters required will depend on the diver- 
sity of the group. For example, if the study includes indivi- 
duals drawn from three levels of expertise (e.g., trainees, 
surgeons, radiologists), then the absolute minimum num- 
ber of raters would be three. Furthermore, to form any in- 
ferences regarding agreement within a group, or differ- 
ences in agreement between groups (e.g., do radiologists 
achieve better agreement than surgeons?), there must be 
a minimum of two raters from each group. Thus, in the ex- 
ample provided a minimum of six raters would be re- 
quired: two trainees, two surgeons, and two radiologists. 
A similar approach is used if the study were taking place 
at different centers, or if other variables needed to be in- 
cluded and controlled for. 

The desire to include multiple raters from each demo- 
graphic group must again be balanced against the feasibil- 
ity of completing the study. This will largely depend on the 
nature of the subjects: if the study consists of rating X-rays, 
it may be reasonable to involve five or more raters viewing 
each film; whereas if the subjects are patients being tested 
the number of raters will likely have to be more limited. 
Once the number of raters has been fixed, a calculation 
can be performed to estimate the number of subjects 
required. 
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Traditionally, sample size calculations are performed to 
determine the number of subjects necessary to reject a 
null hypothesis. This type of calculation has been de- 
scribed for reliability studies, whereby the investigators 
determine some minimally acceptable value and attempt 
to prove that the reliability is greater than it.'*!° However, 
in most reliability studies there is not really a null hypoth- 
esis that the investigators are attempting to disprove. 
Therefore, it is more logical to estimate the sample size 
based on the desired precision of the reliability estimate; 
in other words, the confidence interval (CI). 

Four variables are needed to estimate the sample size of 
subjects: the number of raters, the expected ICC, the CI (we 
will assume 95% for the remainder of this section), and the 
precision (width of the CI). How to select the number of 
raters has been discussed. The expected ICC can be esti- 
mated from past studies, or an “acceptable” level can be 
used. Finally, the desired width of the CI can be selected. 
It is ideal to have narrow CIs; however, this must be ba- 
lanced against the feasibility of the sample size required. 

Once these four variables have been determined, the 
sample size required can be calculated. For simplicity, 
commonly used values and the resultant sample size re- 
quirements are displayed in Table 15.4. For a full discus- 
sion of the sample size requirements and the equations 
used to calculate these values, refer to Giraudeau and 
Mary.'® Note that the sample size requirement is highly de- 
pendent on the expected ICC. Because this is only a rough 
estimation, the actual Cl, calculated at the end of the study, 
may be quite different from the value used in the calcula- 
tion if the expected ICC is different from the observed re- 
sult. Thus, it is important to realize that this is simply an es- 
timate of the sample size required. Like any study, it is good 
to have some statistical support for the selection of sample 
size; however, other factors such as the feasibility of the 
study must also be considered. 


5. Conduct the Rating Session(s) 


The actual procedures used for the rating session(s) will 
depend on the nature of the instrument and the sample 
being tested. Clearly, different methods are necessary to 
test the reliability of a classification system in patients as 
opposed to radiographs. Nevertheless, there are some 
principles that apply to all situations. 

The goal of any reliability study is to provide information 
that can be directly applied to clinical or research practice. 
Hence, it is important that the testing environment is as si- 
milar as possible to the actual clinical or research environ- 
ment. For example, ifan instrument were designed to mea- 
sure ROM in hospitalized patients, it would be important to 
conduct the reliability study in a hospital, with patients in 
the beds they would typically be in at the time of measure- 
ment. Similarly, if a fracture classification system were 
designed that involved the estimation of lengths or angles, 
investigators should only make tools like rulers or protrac 
tors available if the clinicians or researchers that would be 


15 The Reliability Study 


Table 15.4 Sample Size Estimation Using the Intraclass 
Correlation Coefficient (ICC) 


Number Expected 
of Raters ICC 


Number of Subjects Required for 
95% Confidence Interval 



























































+0.05 +0.10 +0.20 
0.9 56 14 4 
0.8 200 50 13 
2 0.7 400 100 25 
0.6 630 158 40 
0.5 865 217 55 
0.9 36 9 3 
0.8 119 30 8 
4 0.7 222 56 14 
0.6 322 81 21 
0.5 401 101 26 
0.9 31 8 2 
0.8 103 26 7 
6 0.7 187 47 12 
0.6 263 66 17 
0.5 314 79 20 
0.9 29 8 2 
0.8 92 23 6 
10 0.7 164 41 11 
0.6 224 56 14 
0.5 259 65 17 


using the system would likely have access to similar instru- 
ments. 

Irrespective of the study setting, all participants should 
independently complete the ratings in identical testing si- 
tuations. This may involve multiple classification sessions 
or one large session. Raters must not discuss cases until 
all ratings have been completed. For example, in a study 
to measure the reliability of the Schatzker classification 
system for tibial plateau fractures, each rater could view 
the radiographs independently on a personal computer, 
or each image could be displayed for all raters simulta- 
neously via a digital projector (or light box). Either method 
would be acceptable (the ideal method would be which- 
ever was used most commonly in practice), but it would 
not be acceptable for some raters to view the images on 
personal computers and the rest to view projected images 
(unless this was a variable being intentionally tested). It is 
also unacceptable for raters to discuss each image before 
assigning a classification. 

Finally, raters should not be given additional information 
about the cases that is not intended to contribute to the clas- 
sification. Referring again to the Schatzker classification 
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system, raters should assign ratings based onlyon the radio- 
graphic appearance of the fracture; they should not have 
access to other information such as patient age or mechan- 
ism of injury, which could potentially influence their rat- 
ings. Thus, in a radiographic reliability study, all patient 
identifiers should be removed from the images (for ethical 
as well as methodological reasons). In a study that involves 
examining patients, the raters should not be aware of the 
patient’s history or other clinical information. 


6. Analyze the Data 


Once the ratings sessions have been completed, all data 
should be entered into a computerized database for analy- 
sis. There are several statistical options available to de- 
scribe the agreement; see the previous section on mea- 
sures of reliability for a discussion of the most common 
Statistics. Due to the wide variety of analytical methods 
used and the possibility for differing results depending 
on the statistic, it is important to plan and document the 
analytic strategy in as much detail as possible prior to col- 
lecting the data. The most appropriate statistic will depend 
on the nature of the study question; whether the rating 
scale is dichotomous, categorical, or continuous; and the 
expected distribution of responses. 

There are advantages and disadvantages to every statis- 
tical strategy; hence, some authors have advocated calcu- 
lating and reporting more than one statistical summary.” 
For example, although the raw proportion agreement can 
be misleading in isolation, it may be useful to calculate 
and report it along with either chance-corrected or 
chance-independent agreement because it is more easily 
interpreted. Similarly, it might be worthwhile to conduct 
an ANOVA or regression analysis in conjunction with a 
more traditional measure of reliability to detect systematic 
differences in ratings, as well as simple agreement. 


7. Interpret the Results 


Although P values can be calculated for all of the statistics 
discussed, in this case the magnitude of the agreement sta- 


Critical Appraisal of the Surgical Literature 


A group of investigators recently conducted a systematic 
review of the methodological quality of 44 reliability stu- 
dies of fracture classification systems.'® Because a vali- 
dated checklist of recognized reliability study quality cri- 
teria does not exist, a list of relevant items was derived. A 
wide variety of study methodologies was discovered, 
with the number of raters ranging from 2 to 36, and the 
number of subjects (fractures) ranging from 10 to 200. In- 
terestingly, none of the studies justified the sample size 
used, and only 9% of the studies incorporated raters that 
were representative of the intended users of the classifica- 


tistic is far more important than the statistical significance. 
Thus, the most appropriate results to report are the value 
of the agreement statistic(s) and the associated CIs. 

Several guidelines have been proposed to interpret the 
level of agreement calculated in studies. Fleiss and Landis 
and Koch'® proposed criteria for the kappa coefficient most 
commonly cited in research practice. Fleiss’ guidelines se- 
parate the agreement into three categories: excellent 
(>0.75), fair to good (0.4-0.75), and poor (<0.4). Landis 
and Koch’s guidelines are somewhat more liberal: almost 
perfect (>0.8), substantial (0.6-0.8), moderate (0.4-0.6), 
fair (0.2-0.4), slight (0.0-0.2), and poor (<0.0). 

It ist important to note that all of these classification sys- 
tems are completely arbitrary. Remember that the “accep- 
table” level of reliability depends on several factors, includ- 
ing the sample size, the patient population, the minimally 
important difference, and the proposed use of the instru- 
ment. Ultimately, the decision regarding which of these 
guidelines to adopt (or not to use any guidelines) is at 
the discretion of the researchers. However, if the research- 
ers choose to interpret the results of the study using a set of 
criteria, it is important to make this decision prior to con- 
ducting the study, and to document in the research proto- 
col what criteria will be used. 


Key Concepts: Key Steps in Conducting a Reliability 

Study 

Define the question as specifically as possible. 

Select and recruit appropriate participants (raters). 

Select and gather appropriate subjects. 

Estimate the required sample size. 

Conduct the rating session(s) systematically and 

with appropriate safeguards. 

6. Analyze the data using statistical methods deter- 
mined prior to commencing the study. 

7. Interpret the results in the context of other research 
in the field. 


Ub WwW NMS 


tion. Clearly, there is room for improvement in this rela- 
tively new study methodology. 


Examples from the Literature: Checklist for the Critical 

Appraisal of a Reliability Study 

1. Was the classification system(s) clearly described? 

2. Was the study population defined by clear inclusion 
and exclusion criteria? 

3. Were selected cases representative of the study po- 
pulation? 

4. Was the size of the sample justified? 
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5. Were the raters representative of the intended users 
of the instrument? 

Was the number of raters appropriate? 

7. Were the ratings applied independently during clas- 
sification sessions? 

Were raters blinded to patient clinical information? 
Was the true distribution of classification categories 
in the sample estimated (if a “gold standard” exists)? 
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The Classification of Outcomes 


“It is better to know some of the questions than all of the answers.” 


Summary 


Here an introduction to outcome measures is presented. 
Surgeons are provided with the tools to help them to deter- 
mine which outcome measures are most appropriate for 
their use. Topics discussed in this chapter include perspec 
tives, time horizon, and using health-related quality of life 


Introduction 


Many new surgical and trauma procedures have been inte- 
grated into practice without being fully evaluated for their 
effectiveness. In some cases, new interventions, intro- 
duced into practice despite a lack of evidence supporting 
its effectiveness, have become routine practice, but have 


— James Thurber 


(HLQF) as an outcome measure. The difference between 
generic- and disease-specific health-related quality of life 
measures is reviewed. Tips on selecting an appropriate 
measure and on how to administer the HRQL measures 
are also shared. 


later proved harmful, ineffective, or no more effective 
than similar but less-expensive interventions.' Selecting 
an appropriate outcome measure is a vital component of 
fully evaluating a new procedure or therapy. 


Different Perspectives for Measuring Outcomes 


Outcomes can be measured from several different perspec 
tives and the appropriate perspective depends on the re- 
search objective and research question. Perspectives in- 
clude those of the surgeon, the patient, the hospital, and 
society (Table 16.1). The traditional perspective used 
when measuring outcomes in most surgical specialties 
has been that of the surgeon. Outcomes from the surgeon’s 
perspective include operative complications, successful 
procedures, or limbs saved. Unfortunately, the surgeon’s 
perspective does not take into account the patient’s per- 
spective. Patients have an important role to play in com- 
municating the impact of disease and the effectiveness of 


Measuring the Patient’s Perspective 
Time Horizon 


The time horizon refers to when and how often the re- 
search participant’s status should be measured. In other 
words, how long and how often do you follow a research 
participant? The time horizon is variable depending on 
condition or injury and requires expert opinion. Selecting 
the appropriate time horizon also depends on the research 
question and the research objective. For example, if the 


health care.? Well-developed patient-reported outcome 
measures can provide a clinically relevant and scientifi- 
cally rigorous resource for including the patient perspec 
tive in decisions about health care and subsequent evalua- 
tion.” The hospital’s, third-party payer’s, and society's per- 
spectives as appropriate are also important. For instance, if 
you were conducting an economic evaluation of two surgi- 
cal interventions, it would be important to consider not 
only the perspective of the patient, but also the perspec 
tives of the hospital, third-party payer (if applicable), and 
that of society. The research question and objective will 
help to select the appropriate perspective(s). 


research question is “In patients with a femoral shaft frac 
ture, does a new surgical technique promote fracture heal- 


Table 16.1 Example of Different Perspectives 


Surgeon - Operative complications 
Patient - Health-related quality of life 
Hospital - Length of hospital stay 
Society - Total costs to society 
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Table 16.2 Example of Different Time Horizons 


Chronic osteomyelitis (time without drainage -2 years) 

Tibia fracture (time to union -1 year) 

Open wound (time to wound healing -3 months) 

Hip fracture (time to development of avascular necrosis -2 years) 


ing and improve patient’s return to function compared to 
the current standard of care?” The time horizon should 
be based on the time that it would take the femoral shaft 
fracture to heal. A time horizon of 3 months would not 
likely capture the fracture healing or the patients’ return 
to full function. A more appropriate time horizon would 
be 1 to 2 years after the surgical treatment. Table 16.2 pro- 
vides a list of different conditions and outcome measures 
with corresponding time horizons. 


Health-Related Quality of Life 


The World Health Organization defines health as “a state of 
complete physical, mental, and social well-being.”? Ques- 
tioning patients’ well-being within each of these areas is 
necessary to represent the concept of health when evaluat- 
ing a patient’s health in a clinical or research setting.! 
Health-related quality of life (HRQL) instruments measure 
aspects of this comprehensive concept of health. These 
measures encompass a broad spectrum of items including 
those associated with activities of daily life, such as work, 
recreation, household management, and relationships 
with family, friends, and social groups. HRQL considers 
not only the ability to function within these roles, but 
also the degree of satisfaction derived from performing 
them.! 

Including HRQL in surgical studies facilitates under- 
standing patients’ perspective on what is gained or lost 
as a result of treatment of an injury or a disease. Tradi- 
tional surgical measures such as radiographic healing 
and operative complications often do not provide defini- 
tive answers about whether or not a procedure or treat- 
ment is useful from a patient’s perspective. In fact, objec 
tive measures may correlate poorly to a patient’s own feel- 
ing of wellness.' HRQL is important to measure when as- 
sessing the impact of a treatment for a long-term injury 
or chronic disease where the goal of the treatment is to im- 
prove the patient’s physical function. In addition, im- 
proved physical function may result in benefits to other as- 
pects of health.! 

The decision to include a HRQL measure in a study 
should be aligned with the trial’s objectives. The objectives 
of the study should be clearly stated in the protocol and the 
investigator should have a comprehensive understanding 
of the disease or injury, expected benefits or harm of treat- 
ment(s), and how these factors may influence a patient’s 
HRQL.! When choosing an HRQL measure for clinical 
research or when deciding whether the HRQL measure 


used to report health status in an article of interest was ap- 
propriate, the investigator should have a clear understand- 
ing of the intended role of the measure.'! Some commonly 
used HRQL parameters in the surgical literature are dis- 
cussed in Chapters 18 and 19. 


Key Concepts: Including Health-Related Quality of Life 
Measures as an Outcome Measure 

The decision to include a HRQL measure in a study 
should be aligned with the trial’s objectives. To provide 
the most comprehensive evaluation of treatment effects, 
no matter the disease or intervention, investigators in- 
clude a disease-specific and generic health measure. 


There are two types of measures of HRQL. The first, general 
health and utility measures address health in a broad 
sense; they can be applied and compared across many si- 
tuations.! The second type, specific measures address nar- 
rower aspects of life related to a specific problem, function, 
or manifestations of an underlying disease process.' Both 
types of HRQL instruments have advantages and disadvan- 
tages. 


Key Concepts: Generic versus Disease-Specific Health- 

Related Quality of Life Measures 

e Generic measures include general health and utility 
measures, which address health in a broad sense; 
they can be applied and compared across many situa- 
tions. 

e Disease-specific measures, addresses narrower as- 
pects of life related to a specific problem, function, 
or manifestations of an underlying disease process.! 


Generic Measures 


A generic instrument measures a patient’s general health 
status including physical symptoms, function, and emo- 
tional status. The scope of a generic instrument is typically 
broad and does not include a detailed investigation into 
each area. It typically is administered in the form of a ques- 
tionnaire. The broad scope of a generic instrument has the 
advantage of allowing investigators to compare health sta- 
tus across different diseases, severities, interventions, and 
in some cases, across different cultures.' A disadvantage of 
generic instruments, however, is that they may not be sen- 
sitive enough to be able to detect small, but important 
changes.! 

Autility-based instrument is another form of generic in- 
strument that measures health status by using a conti- 
nuum anchored by death and optimum health. Assess- 
ment of health utility is rooted in decision theory, which 
models the decision-making process expected of rational 
individuals when faced with uncertain outcomes.' 
Through placement on a continuum with anchors of death 
and full health, utility measurement provides a method of 
comparing alternative interventions, patient populations, 
and diseases. Utility measurement is particularly useful 
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when attempting to measure the cost-effectiveness of 
competing interventions in which the cost of an interven- 
tion is related to the number of quality-adjusted life-years 
(QALYs) gained (see Chapter 13). 


Key Concepts: Approaches to Measuring Utilities 

e Rating scales or instruments 

e Standard gamble: current or new, uncertain health 

e Time trade-off: live long with current health or short 
with perfect health 


There are two approaches to preference measurement. The 
first approach focuses on the preferences of the general po- 
pulation; the other approach focuses on the preferences 
of the individual patient. The first approach uses rating 
scales, which are similar to general health instruments. 
Patients are asked to complete a questionnaire that rates 
their ability to function in physical, emotional, and social 
aspects of life. A mathematical model based on preference 
ratings of health states ideally derived from a random sam- 
ple of the general population is used to score the patient’s 
utility on the scale between death and full health. Exam- 
ples of such questionnaires are provided in Chapter 18. 

The second method evaluates the preferences of the in- 
dividual patient, ideally by making a decision under uncer- 
tainty. Examples of this type of preference measure in- 
clude the standard gamble and time trade-off. When ad- 
ministering the standard gamble, patients are given two 
choices: to remain in their current health state or to take 
a gamble and try a new treatment for which the outcome 
is uncertain.* The uncertain outcome is presented as the 
probability of obtaining perfect health or immediate death 
with treatment. A patient’s score is defined by the point of 
indifference at which the patient has no preference be- 
tween remaining in their current health state or taking a 
chance with a new treatment and its uncertainty. 

The time trade-off method is similar to the standard 
gamble except that instead of probabilities of perfect 
health or immediate death, patients are offered a choice 
of living for a defined amount of time in perfect health or 
a variable amount of time in an alternate state of health 
that is less desirable.‘ Patients are asked to choose between 
diminishing years spent in perfect health and a defined 
amount of time in their current health state. A patient’s 
score is defined as the point of indifference. 

Criticisms of the standard gamble and time trade-off in- 
clude the unrealistic choices patients are given (having to 
choose between perfect health and immediate death), 
which is not representative of the choices that patients 
are commonly faced with. In addition, the standard gamble 
demands a proficient understanding of probability if the 
measure is to be valid.! 
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Disease-Specific Measures 


Disease-specific measures are tailored to inquire about the 
specific physical, mental, and social aspects of health af- 
fected by the disease or injury in question. An advantage 
to using a disease-specific instrument is that small impor- 
tant changes can be detected. A disadvantage of using a 
disease-specific instrument is that they do not allow for 
comparisons between studies of different diseases, and 
sometimes do not even allow for comparisons between 
different populations within the same disease (e.g., chil- 
dren and adults) because they are so focused (Table 16.3).! 


Modes of Administration 


Methods of administering HRQL instruments include per- 
sonal interviews, mailing of the questionnaire to the pa- 
tient, telephone interviews, and patient self-administra- 
tion. The personal interviews and the patient self-adminis- 
tration can take place while the patient is still in hospital, 
in the clinic at a follow-up appointment, or at the patient’s 
home. In cases where the patient is too sick to complete 
questionnaires, a proxy or stand-in may answer questions 
on behalf of the patient. The strengths and weaknesses of 
the different modes of administration are summarized in 
Table 16.4. The choice of which method to use will depend 
largely on the research question, characteristics of the in- 
strument, characteristics of the patient population, and 
feasibility issues associated with cost and patient burden.! 


Selecting an Appropriate Measure 


Selecting the appropriate outcome measures is an impor- 
tant phase in the development of the study protocol. The 
first step in selecting the outcome measures for any re- 
search initiative is to formulate and describe the study’s re- 
search objectives, research questions, and aims. It is neces- 


Table 16.3 Advantages and Disadvantages of Generic and 
Disease Specific Health-Related Quality of Life Measures 








Type of Advantages Disadvantages 
Instrument 
Generic Allow investigators to They may not be sensitive 
compare health status enough to be able to 
across different detect small, but impor- 
diseases, severities, tant changes. 
interventions, and in 
some cases, across 
different cultures 
Disease- Small, important Do not allow for compari- 
specific changes can be sons between studies of 


detected. different diseases 


Sometimes do not allow 
for comparisons between 
different populations 
within the same disease 
(e.g., children and adults) 
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Table 16.4 Advantages and Disadvantages of Different Modes of Health-Related Quality of Life Administration 


Mode of Administration Advantages 


Disadvantages 














Interviewer Maximal response rate Costly 
Can clarify questions Interviewer bias 
Higher completion rate Reporting bias 
Control over who is the respondent Characteristics of the interviewer (voice inflections, age, 
Control over the order of questions race, gender) may introduce bias. 
Telephone Greater response rate than request by mail Excludes those without access to a telephone 
Relatively inexpensive Voice inflections of the interviewer may introduce bias. 
Relatively quick data collection 
Interviewer can probe for incomplete answers. 
Data collector can get clarification for ambiguous 
answers. 
Mail Relatively inexpensive Response rates generally low 
No bias introduced through the interviewer Possibility of bias due to nonresponse 
May reach more respondents No control over who is the respondent 
Respondents can take time to locate certain in- May misunderstand the question 
formation May miss questions (incomplete) 
Questionnaire may be lost in the mail. 
Excludes illiterate, less educated, handicapped, and non- 
English-speaking populations 
Self Maximal response rate May misunderstand the question 
Inexpensive May miss questions (incomplete) 
Proxy Can collect information on patients who otherwise Response may differ from target 


are not represented 


Source: From Jackowski D, Guyatt GH. A guide to health measurement. Clin Orthop Relat Res 2003;413:80-89. Reprinted by permission. 


sary to understand the purpose of the research initiative so 
that appropriate selections can be made from the many 
available measurement instruments. The next step is to 
determine how the outcomes will be measured. This can 
be dependent on the study design and the resources avail- 
able. In general, short, feasible, well-validated, and reliable 
instruments are recommended if care providers are to use 
them in their clinical work, where time is of essence. Inves- 
tigators should also consider using a combination of out- 
comes, including clinical outcomes (Table 16.5) and 
HRQL. Research ethics review boards require that adverse 
events and complications be recorded and reported for 
all clinical trials. 

To provide the most comprehensive evaluation of treat- 
ment effects, no matter the disease or intervention, inves- 
tigators often include both a disease-specific and generic 


Table 16.5 Examples of Clinical Outcomes 


Blood loss during surgery 

Number of days to radiographic fracture healing 
Infection 

Adverse events and complications 

Reoperations or revision surgeries 

Blood pressure 


health measure. In fact, many granting agencies and ethics 
boards insist that a generic instrument be included in the 
design of proposed clinical studies.! 


Tips for Administering Health-Related Quality of 

Life Questionnaires 

Administering a questionnaire in the best possible way can 
be surprisingly difficult, especially for someone with lim- 
ited experience in questionnaire administration. Many of 
the approaches one might intuitively take can actually 
cause problems. The following guidelines will hopefully 
help interviewers and administrators to develop optimal 
procedures. 

Be as informal, friendly, relaxed, and conversational as 
possible when interacting with patients and research par- 
ticipants. The better the rapport with a patient, the more 
easily and happily they will acquiesce to requests, like fill- 
ing out questionnaires. This informality and relaxed ap- 
proach is encouraged before and after administering the 
questionnaires. However, in the actual administration of 
the questionnaire, a different style is required. 

It is important to never “help” the patient determine the 
answer to a question on a HRQL instrument. Initially, many 
patients may feel as if the questionnaire is some kind of 
test, and they may look to the administrator’s approval, 
to ensure they are providing the correct answer. Of course, 
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there are no right or wrong answers, the patients are sim- 
ply supposed to answer the questions as they understand 
them, and to pick the option that they think best expresses 
how they have been feeling. 

Ifa patient asks a question regarding which response op- 
tion to select, it is important to remain neutral in your re- 
sponse to them. Our instincts (and in addition to instincts, 
perhaps formal training) lead us to be as helpful as possi- 
ble, and it may be hard to avoid saying “Yes, that’s right” 
to a question of this sort. Such a response, however, would 
destroy the validity of the questionnaire. It must be made 
clear to patients that it is their choice, and that whatever 
they choose is alright. The appropriate answer to the pa- 
tient’s question would convey, “There is no right or wrong 
answer. You pick whatever number you think best de- 
scribes how you have been feeling.” 

If a research participant has a question about the mean- 
ing of a question, never explain, elaborate, or paraphrase 
questionnaire items into your own words. When the pa- 
tient does not understand a question, repeat the question 
exactly as worded. Often, after repeating the question, 
the patient will “pick up” on the meaning. If not and they 
are still confused, some useful phrases are “It is whatever 
it means to you” or “There are no right or wrong answers, 
we would like you to choose the best answer that you can.” 

It is necessary to be neutral in your response to patients’ 
answers. If a patient tells you he or she is unhappy, or de- 
pressed, it would ordinarily be impolite or inconsiderate 
not to express sympathy and empathy. If you are trained 
as a nurse, doctor, or other health care worker, you may 
even feel you should volunteer advice, suggestions, or en- 
couragement. This is all fine to do before and after the 
questionnaire, but while administering the questionnaire 
you must be totally neutral in your responses to the pa- 
tients' answers. You must be careful that nothing in your 
words or manner implies surprise, sympathy, approval, 
or disapproval of the patients’ answers. 

It is important to watch for inconsistencies in the com- 
pletion of the questionnaire. For example, in the SF-12v2 
(a short form generic health questionnaire),° there are 
questions that ask about the patient’s health. One of the 


Conclusions 


Choosing the appropriate outcomes to measure is an im- 
portant decision when planning for any research initiative. 
It needs to be a well-informed decision, and reviewing the 
relevant literature and consultation with experts is a re- 
quirement. The investigator needs to match the study out- 
comes and instruments with the research question and the 
study aims and objectives. Most studies benefit from using 
multiple outcome measures including the appropriate 
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questions asks patients about their health in general and 
another asks about how their health has limited them 
when doing moderate activities. If the patient indicated 
that their health was excellent, but goes on to indicate 
that their health has limited them a lot when doing mod- 
erate activities, you would need to verify the answers. One 
way of handling this is to say, “You have indicated that your 
health is excellent. You have also answered that your 
health limits you a lot when doing moderate activities - 
is this correct?” 

Sometimes, patients may not understand one of the 
questions or have misinterpreted the response options. Pa- 
tients often do not read the response options carefully, or 
they may get confused and misinterpret the response op- 
tions. When this happens, the key to handling the situation 
is to read the response the patient has given for the ques- 
tion without suggesting there is something wrong with 
the answer presented. The patient must feel no pressure 
to change an answer. One way of handling this situation 
is to ask a question, such as “The answer that you have gi- 
ven means that when you were speaking with people who 
know you well during the past 2 weeks they were not able 
to understand you. Is it true that, in the last 2 weeks, when 
you were speaking with people who know you well, they 
didn't understand you?” If the patient answers that it is 
not true, ask them to review the question and see if they 
want to change their answer. 


Key Concepts: Tips for Health-Related Quality of Life 

Instrument Administration 

e Never “help” a research participant determine the 
answer to a question on a HRQL instrument. 

e Ifa patient asks a question regarding which response 
option to select, it is important to remain neutral in 
your response to them. 

e Do not explain, elaborate, or paraphrase questionnaire 
items into your own words. 

e Remain neutral in response to the research partici- 
pant’s answers. 

e Look for inconsistencies in answers. 


clinical outcome measures: a generic quality of life instru- 
ment and a disease-specific quality of life instrument. In 
addition, the investigator needs to be careful not to over- 
burden the study staff and research participants with too 
many or too cumbersome outcome measures. It is also im- 
portant that the person administering the questionnaire is 
trained appropriately. 
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What Makes an Outcome Measure Useful 


“If we all worked on the assumption that what is accepted as true is really true, there 
would be little hope of advance.” 


Summary 


What makes an outcome measure useful is the subject of 
this chapter. For the health-related quality of life (HRQL) 
instruments discussed, this would include reliability, va- 
lidity, and responsiveness to change. Awareness of cultural 


Introduction 


To be useful in a research or clinical setting, a health-re- 
lated quality of life (HRQL) instrument, like any other 
method of measurement, needs to be valid.' Validity is 
the degree to which the measure reflects the construct 
that it was intended to measure.” Discriminative instru- 
ments must also be reliable; they must yield similar results 
upon repeated administration to a stable population.* The 
distinction between reliability and validity is an important 
one: a measure can be reliable, but not necessarily valid 
(i.e., provides a consistent measure of an unintended attri- 
bute).! “Instruments used to evaluate change over time 
must not only be valid, but also able to detect important 
changes even if those changes are small.”! 


Reliability 


Reliability is the extent to which an instrument yields the 
same results in repeated applications in a population with 
stable health.? Reliability is assessed by tests of reproduci- 
bility or repeatability. Several different methods may be 
used to assess the reliability of an outcome measure.” 
These methods include interrater, test-retest, and internal 
consistency reliability.’ “Interrater reliability is expressed 
as the magnitude of agreement between scores given to 
the same patient by two or more raters when no change 
in the health status of the patient has occurred.”” Essen- 
tially, interrater reliability is the degree of agreement be- 
tween different observers. “Test-retest reliability is ex- 
pressed as the magnitude of agreement between scores 
obtained through repeated measures of the same patients 
over stable health conditions.”? Intraobserver or test- 


— Orville Wright 


and language barriers in creation of an outcome measure is 
also addressed. In addition, what elements to look for and 
to include in studies that measure HRQL are reviewed. 


Jargon Simplified: Validity 
“Validity represents the extent to which an instrument 
is measuring what it is intended to measure.”? 


Jargon Simplified: Reliability 

“Reliability refers to the consistency or reproducibility of 
data. In other words, the ability of a measure to yield the 
same result when reapplied to stable patients.” 


Key Concepts: Validity and Reliability 

“The distinction between reliability and validity is an 
important one as a measure can be reliable, but not 
valid (provides a consistent measure of an unintended 
attribute). Instruments used to evaluate change with 
time must not only be valid, but also must detect impor- 
tant changes even if those changes are small.”! 


retest reliability is the agreement between observations 
made by the same observer. Intraclass correlation coeffi- 
cient (ICC) is one of the statistical measures of agreement 
that can be used for assessing test-retest reliability.” To es- 
timate test-retest reliability, the same HRQL instrument is 
completed by the same patient on two different occasions 
with the assumption that no substantial change in the 
health status of the patient has occurred between the 
test dates.’ 

The reliability of a test is indicated by the reliability coef- 
ficient. “Reliability is expressed as a number ranging be- 
tween zero and one; as it approaches zero there is lower 
reliability and a reliability coefficient close to one indicates 
higher reliability.””. Therefore, the larger a reliability coef- 
ficient is, the more repeatable or reliable the test scores. 
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General guidelines exist for interpreting reliability coeffi- 
cients. A reliability coefficient value of 0.90 and greater is 
said to be excellent; a reliability coefficient value of 0.80 
to 0.89 is good; a reliability coefficient value of 0.70 to 
0.79 is adequate; and a reliability coefficient value below 
0.70 may have limited applicability. 

“Internal consistency is a reflection of the homogeneity 
of the items that make up the questionnaire measuring a 
particular construct. The more correlated the items, the 
greater the estimate of internal consistency.”! 

Several factors may influence the reliability of a measure 
between test dates. These factors include differences be- 
tween the conditions of administration; the effects caused 
by repeated testing, such as learning and regression to the 
mean; specific factors affecting participants in their daily 
lives, such as mood and circumstance; and the length of 
time between test administration.! The assessment of re- 
liability is further complicated by the fact that changes in 
the attribute being measured may have occurred since 
the administration of the first test and the second test.! 


Jargon Simplified: Interrater Reliability 

“Inter-rater reliability is expressed as the magnitude of 
agreement between scores given to the same patient 
by two or more raters when no change in the health sta- 
tus of the patient has occurred.” 


Validity 


Validity is an estimation of the extent to which an instru- 
ment measures what it was intended to measure.’ Validat- 
ing an instrument involves accumulating evidence of the 
degree to which the measure represents what it was in- 
tended to represent.' There are several different methods 
to assess the validity of a HRQL instrument, including 
face validity, content validity, and construct validity. Each 
of these types of validity is described below. 


Face Validity 


“An instrument has face validity if it appears to be measur- 
ing what it is intended to measure.”! In face validity, the 
developer and other experts look at the new HRQL instru- 
ment to see whether it seems like a good translation of the 
construct.” Face validity is the weakest method of demon- 
strating validity. 


Jargon Simplified: Face Validity 
“An instrument has face validity if it appears to be mea- 
suring what it is intended to measure.”! 


Jargon Simplified: Test-Retest Reliability 

“Test-retest reliability is expressed as the magnitude of 
agreement between scores obtained through repeated 
measures of the same patients over stable health condi- 
tions.”? 


Jargon Simplified: Internal Consistency 

“Internal consistency is a reflection of the homogeneity 
of the items that make up the questionnaire measuring a 
particular construct. The more correlated the items, the 
greater the estimate of internal consistency.”! 


The reliability coefficient is expressed as a number ran- 
ging between zero and one; as it approaches zero, there 
is lower reliability and a reliability coefficient close to 
one indicates higher reliability.’ 


Key Concepts: Factors that May Influence Reliability 


e Differences between the conditions of administration 

e Effects caused by repeated testing, such as learning 
and regression to the mean 

e Specific factors affecting participants in their daily 
lives, such as mood and circumstance 


| Jargon Simplified: Reliability Coefficient 
e The length of time between administrations 


Content Validity 


“Content validity is the extent to which items that make up 
a measure are inclusive of the construct of interest.”! Con- 
tent validity refers to the ability of the items on a HRQL in- 
strument to adequately measure the content of the prop- 
erty that the instrument is designed to measure. A clear 
understanding of the concept being measured is essential 
when constructing an instrument and when assessing 
content validity.! Evaluation of content validity tends to 
be subjective, but ideally includes systematic comparison 
of the new measure with existing and established theore- 
tical definitions, expert opinions, and interviews with in- 
dividuals for whom the measure is targeted to ensure 
that all dimensions are fairly represented.! 


Jargon Simplified: Content Validity 
“Content validity is the extent to which items that make 
up a measure are inclusive of the construct of interest.”! 


Construct Validity 
A construct is a theoretical framework that represents an 


idea. ' Health is the construct when measuring health sta- 
tus. Because it is not possible to measure all aspects of a 
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person’s health directly, we must also measure things we 
associate with good health such as no pain and being 
able to work, play, and interact with family and friends. ! 
An understanding of the concept of health within the po- 
pulation of interest can guide researchers as to how the in- 
strument should function with respect to differing disease 
states and in response to treatment.! “In the evaluation of 
construct validity, one hypothesizes the course (and some- 
times the magnitude) of relationships that may develop 
either as a result of an intervention or in relation to a dif- 
ferent population. Hypotheses are then confirmed or re- 
futed through observation.”! 


Jargon Simplified: Construct Validity 

“In the evaluation of construct validity, one hypothesizes 
the course (and sometimes the magnitude) of relation- 
ships that may develop either as a result of an interven- 
tion or in relation to a different population. Hypotheses 
are then confirmed or refuted through observation.”! 


Two measures of construct validity are convergent validity 
and discriminate validity. “Convergent validity refers to 
the extent to which different ways of measuring the 
same attribute correlate with one another.”! For example, 
one would expect two instruments that claim to measure 
quality of life in patients with osteoarthritis of the knee 
who have undergone arthroplasty to function in a similar 
manner:! “In discriminate validity, a measure does not cor- 
relate strongly with measures that are intended to mea- 
sure different attributes.”! For example, in a validity study 
for the arthroplasty instruments for hip and knee arthro- 


Responsiveness to Change 


As described in the previous chapter, generic HRQL instru- 
ments typically have a broad perspective that is not speci- 
fically related to the restricted score of the HRQL of a spe- 
cific disease or condition. Using a generic instrument has 
the advantage of allowing comparisons of health status 
to be made across different diseases and health states.” A 
disease-specific outcome measure focuses on the disease 
or condition being studied, allowing greater sensitivity to 
intervention-related change compared with a generic 
measure. When deciding to use a generic instrument or 
a disease-specific instrument to measure HRQL, it is im- 
portant to consider the responsiveness to change of a 
HRQL instrument.® 

“Responsiveness to change refers to the ability of an 
HRQL instrument to reflect underlying change or true dif- 
ference between pretreatment and posttreatment scores 
within the construct being measured.”! Responsiveness is 
therefore proportional to the change in scores that consti- 
tutes a clinically important change, and inversely propor- 
tional to the variability in scores across stable patients in 
the study.! 
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plasties, the arthroplasty instrument showed almost no 
correlation with most of the generic HRQL dimensions.* 


Jargon Simplified: Convergent Validity 

“Convergent validity refers to the extent to which differ- 
ent ways of measuring the same attribute correlate with 
one another.”! 


Jargon Simplified: Discriminanast Validity 

“In discriminate validity, a measure does not correlate 
strongly with measures that are intended to measure 
different attributes.”! 


“Criterion validity is often viewed as a special case of con- 
struct validity, in which stronger hypotheses about the ex- 
pected behavior of the new instrument are possible by 
comparison with a measure that is accepted as a gold stan- 
dard.” Unfortunately, it is rare for an absolute gold stan- 
dard to exist for measuring HRQL. However, in some cases, 
standard functional ability, such as reaching rehabilitation 
milestones following an intervention, is established well 
enough to make and test hypotheses about HRQL and the 
ability to function.! 


Jargon Simplified: Criterion Validity 

“Criterion validity is often viewed as a special case of 
construct validity, in which stronger hypotheses about 
the expected behavior of the new instrument are possi- 
ble by comparison with a measure that is accepted as a 
gold standard.”! 


Jargon Simplified: Responsiveness to Change 
“Responsiveness to change refers to the ability of an 
HRQL instrument to reflect underlying change or true 
difference between pre-treatment and post-treatment 
scores within the construct being measured.”! 


The two major aspects of responsiveness are internal re- 
sponsiveness and external responsiveness.® “Internal re- 
sponsiveness characterizes the ability of a measure to 
change over a prespecified timeframe.® External respon- 
siveness reflects the extent to which change in a measure 
relates to a corresponding change in a reference measure 
of clinical or health status.” 


Key Concepts: Internal versus External Responsiveness 

e “Internal responsiveness characterizes the ability of a 
measure to change over a pre-specified timeframe.”° 

e “External responsiveness reflects the extent to which 
change in a measure relates to a corresponding change 
in a reference measure of clinical or health status.”® 
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An important aspect of a HRQL instrument’s responsive- 
ness to change is its ability to detect any effect that is im- 
portant to patients, even if that effect is small.! This is often 
referred to as the “minimally important difference.” In par- 
ticular, instruments should detect change that would lead 
patients to use or accept a new treatment.! 

In trials that measure HRQL and then report no signifi- 
cant difference in HRQL, surgeons must be cautious in 


Cultural and Language Barriers 


Health can be defined as a construct reflecting an indivi- 
dual’s ability to function effectively in perceived roles 
and his or her satisfaction with function. Patients’ experi- 
ences of their function and their satisfaction may vary in 
different cultural settings. ! Social and cultural experiences 
often influence how an individual interprets the conse- 
quences of disease or injury and its related treatment. 
Two patients who have a particular injury of similar sever- 
ity will experience the injury, its treatments, and its effects 
on quality of life differently; some with complete despair, 
some with acceptance, and some even with contentment.! 


accepting the results without first determining whether 
the instrument has demonstrated the ability to detect 
clinically important effects in a similar population in pre- 
vious studies.! If not, it is possible that the failure to detect 
a difference is due to the lack of responsiveness of the in- 
strument and not to the ineffectiveness of treatment.' 


How an individual experiences disease or illness will de- 
pend on how social groups and cultures view death, dying, 
and illness.' For a HRQL instrument to remain valid across 
different cultures, the items on the questionnaires and the 
responses to them must be conceptually and functionally 
equivalent across each of the cultures being considered.! 
It is not enough to know that the words have equivalent 
meanings, as in a straight translation from one language 
to another; one must understand to what extent equiva- 
lent words and phrases convey equivalent meanings.! 


Reviewing and Reporting Health-Related Quality of Life Studies 


The following checklist briefly outlines important issues 
that surgeons should be aware of when reviewing or re- 
porting a study that reports HRQL. 


Examples from the Literature: Checklist for the Review 

and Report of a HRQL Study 

Has the researcher clearly stated the objectives of the 
study? 

1. Has the role of HRQL in meeting these objectives 
been defined? 


Has the instrument demonstrated validity? 

1. Is there a reference made or description of how the 

instrument was developed? 

Does the instrument demonstrate face validity? 

Has the instrument been shown to be valid (content, 

construct, criterion) in a similar population with 

comparable disease severity to that of the current 

study? 

4. Can validity and reliability be generalized to the cur- 
rent population and disease? 


DIN 


Has the instrument demonstrated reliability? 

1. Has the instrument been shown to be reliable over 
repeated administrations (test-retest) to a stable po- 
pulation, similar in characteristics and disease sever- 
ity to that of the current study? 


2. If more than one rater was involved, was interrater 
reliability established? 

3. If a proxy was involved, has the reliability of re- 
sponses provided by a proxy and the patients been 
established for this population? 


Is the instrument sufficiently responsive? 
1. Has the instrument demonstrated the ability to 
detect small, but important clinical changes? 


Are the results of the study valid? 

1. Did the author state, a priori, the desired detectable 
effect size? Has the author provided sufficient evi- 
dence or argument for choosing this effect size (clin- 
ical importance)? 

2. Has the author provided a sufficient description of 
how the questionnaire was administered? 

3. Were data collectors/patients/physicians blinded to 
the treatment, intervention, and exposure to the 
disease being studied? 

4. Were patients similar between groups before the 
intervention? 

5. If questionnaires were mailed, is there an adequate 
comparison of the characteristics of responders and 
nonresponders? 

6. Was the analysis of data appropriate? 

7. Were all participants accounted for? 
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Conclusions 


Measuring health from a patient’s perspective has become 
increasingly important to patients, surgeons, and health 
policy experts. As a result, the number of studies reporting 
HRQL has increased in recent years. For the purpose of con- 
ducting clinical research and evaluating studies reporting 
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Common Generic Outcome Scales for Surgeons 


“The trouble with measurement is its seeming simplicity.” 


Summary 


In this chapter, some of the generic health-related quality 
of life (HRQL) instruments that are routinely used in surgi- 


Introduction 


A recent review of the orthopaedic surgery literature be- 
tween 1991 and 2001 found that there is growing interest 
in the use of HRQL measures to obtain the patient’s per- 
spective in clinical outcome studies, especially in rando- 
mized controlled trials (RCTs).! Furthermore, the most 
common patient-based HRQL measure found in the ortho- 
paedic surgery literature is a generic outcome measure re- 
ferred to as the Short Form-36 (SF-36).! Other common 
measures in the orthopaedic surgical literature include 
disease-specific measures such as the Musculoskeletal 
Functional Assessment (MFA) or the Disability of the 
Arm, Shoulder, and Hand (DASH).! In addition, the Wes- 
tern Ontario McMaster Osteoarthritis Index (WOMAC), 


— Unknown 


cal clinical trials are reviewed. Commonly used disease- 
specific measures are the focus of the following chapter. 


the Simple Shoulder Test, and the Roland-Morris or 
Oswestry are common joint-specific measures.! 


Key Concepts: Common Generic and Utility Measures 
Generic measures 

e Short Form-36 (SF-36) 

Short Form-12 (SF-12) 

e Sickness Impact Profile (SIP) 

e Nottingham Health Profile (NHP) 


Utility measures 

e Health Utilities Index (HUI) 

e EuroQol-5D (EQ-5D) 

e Quality of Well-Being Scale (QWB) 


Generic Health-Related Quality of Life Measures 


Generic HRQL instruments are designed to capture a de- 
scription of the overall health state of a patient. Many gen- 
eric measures have more than one dimension, which can 
include physical, emotional, and social functioning.! The 
primary purpose of a generic instrument is to determine 
the overall health of a patient. Therefore, a well-designed, 
responsive generic instrument would be able capture a pa- 
tient’s hip pain leading to an arthroplasty, as well as the ef- 
fect of comorbidities such as heart disease or diabetes.' 
Generic measures are not always as responsive to change 
as the disease-specific measures,” but as mentioned above, 
their primary role is to capture that broader concept of 
overall health. Generic HRQL measures have been found 
to be useful predictors of outcome.! 


Jargon Simplified: Generic Measures 

Generic health-related quality of life instruments are 
designed to capture a description of the overall health 
state of a patient. 


Short Form-36 


The most commonly used generic instrument in the ortho- 
paedic surgical literature is the Short Form-36 (SF-36).! 
The SF-36 is a multipurpose, short-form health survey con- 
sisting of 36 questions.’ The SF-36 has proven useful in sur- 
veys of general and specific populations, comparing the re- 
lative burden of diseases, and in differentiating the health 
benefits produced by a wide range of different treatments.’ 
The experience to date with the SF-36 has been documen- 
ted in nearly 4000 publications; citations for those pub- 
lished in the years 1988 through 2000 are documented 
in a bibliography by Ware? covering the SF-36 and other 
instruments in the SF family of tools. 

The SF-36 contains multifunction item scales to measure 
eight domains: physical function (10 items), role physical 
(4 items), bodily pain (2 items), general health (5 items), 
vitality (4 items), social functioning (2 items), role emo- 
tional (4 items), and mental health (5 items).? The two 
summary measures of the SF-36 are the physical compo- 
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Physical 
Function 


Source: Ware, Kosinski, and Keser, 1994 





SF-36® Scales Measure Physical 
and Mental Components of Health 





Fig. 18.1 Short Form-36. 

(From Ware JE Jr, Kosinski M, Keller 
SD. SF-36: Physical and Mental 
Summary Scales: A Users’ Manual. 
Boston: The Health Institute, 

New England Medical Center;1994. 
Reprinted by permission. 








nent summary and the mental component summary. The 
scores for the multifunction item scales and the summary 
measures of the SF-36 vary from zero to 100, with 100 
being the best possible score and zero being the lowest 
possible score (Fig. 18.1). 

The SF-36 takes less than 15 minutes to complete.’ It can 
be self-administered or interview-administered and it is 
available in a number of languages.’ To use the SF-36, 
permission (free) must be obtained at http://www. 
qualitymetric.com/products/license. 


Jargon Simplified: Self-Administered versus Interview- 
Administered Questionnaires 

In a self-administered questionnaire, the research parti- 
cipant completes the questionnaire on his or her own, 
using pen and paper. 

In an interview-administered questionnaire, the indivi- 
dual administering the questionnaire reads the ques- 
tions to the research participant and records his or her 
answers. 


Short Form-12 


The SF-12 questionnaire, which was developed from the 
Medical Outcomes Study, is a self-administered, 12-item 
questionnaire that measures HRQL.° Similar to the SF-36, 
the SF-12 measures eight domains and both physical and 
mental summary scores are also obtained.’ As in the SF- 
36, each domain is scored separately from zero (lowest le- 
vel) to 100 (highest level). The instrument has been exten- 
sively validated and has demonstrated good construct va- 
lidity, high internal consistency, and high test-retest relia- 
bility.” To use the SF-12, permission (free) must be ob- 
tained at http://www.qualitymetric.com/products/license. 


Using the SF-36 versus the SF-12 


The decision to use the SF-36 versus the SF-12 is largely 
practical and depends on trials objectives.® The SF-12 re- 
produces the SF-36 summary scales, the physical compo- 
nent score, and the mental component score) very well 
and takes less time to complete. However, the SF-36 
does provide more information about the nature of differ- 
ences in physical and mental health outcomes. Some med- 
ical conditions and most treatments have specific health 
effects that tend to be concentrated in some scales. For ex- 
ample, arthritis typically has its greatest impact on the 
bodily pain score, whereas hepatitis impacts most on the 
vitality score. When there is a need to examine all eight 
scales, the SF-36 is recommended over the SF-12, which 
achieves less precision for all eight scales. In addition, 
for studies with sample sizes over 500 patients, the SF-12 
is preferred. In studies with smaller sample sizes, the SF- 
36 is the better instrument. 


Sickness Impact Profile 


The Sickness Impact Profile (SIP) is another generic HRQL 
measure; it consists of 136 items that measure 12 distinct 
domains.’ The SIP can be self-administered or interview- 
administered and takes 20 to 30 minutes to complete.’ Pa- 
tients identify those statements that describe their experi- 
ence; then each item is weighted depending on the sever- 
ity of dysfunction.’ For each category, the scores are 
summed and expressed as a percentage of the maximum 
score possible and higher scores represent greater dysfunc 
tion.’ Although scores can be calculated for each of the 12 
individual domains, three summary scores are typically 
calculated and reported: total score (includes all domains), 
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a physical score (ambulation, body care, and movement 
and mobility), and a psychosocial score (emotional beha- 
vior, social interaction, alertness, and communication).’ 
The SIP is both reliable and valid.’ 


Nottingham Health Profile 
The Nottingham Health Profile (NHP) was developed to be 


used in epidemiological studies of health and disease.® 
There are two sections to this instrument. The first part 


contains 38 yes/no items in six dimensions: pain, physical 
mobility, emotional reactions, energy, social isolation, and 
sleep. The second section contains seven general yes/no 
questions concerning daily living problems.® The NHP is 
self-administered and takes ~5 to 10 minutes to complete. 
Each item on the scale is weighted; weights are derived 
from patients and nonpatients.® Dimension scores range 
from 0 to 100, with a higher the score representing the 
greater the health problem.® Scores are presented as a pro- 
file rather than an overall score.® 


Utility Health-Related Quality of Life Measures 


As discussed in the previous chapter, utility measures are 
designed to capture the overall value of a health state often 
by assessing a person’s preference for that state over an- 
other. They place a patient’s health state on a continuum 
from perfect health and well-being (a score of 1.0) to death, 
or worse than death (a score of 0). Three instruments that 
are frequently used to measure utility are the EuroQol-5D 
(EQ-5D), the Health Utility Index (HUI), and the Quality of 
Well-Being Scale (QWB). These three instruments are de- 
scribed in detail below. 


EuroQol-5D 


EuroQol-5D (EQ-5D) is a standardized instrument for use 
as a measure of health outcome. Applicable to a wide range 
of health conditions and treatments, it provides a simple 
descriptive profile and a single index value for health sta- 
tus.’ EQ-5D was originally developed to complement other 
HRQL instruments, but is now increasingly used as a stand- 
alone measure.° The EQ-5D is designed for self-completion 
by respondents and is ideally suited for use in mail surveys, 
in clinics, and face-to-face interviews.° It is a simple ques- 
tionnaire, taking only a few minutes to complete.? 

The EQ-5D is a five-item instrument that is designed to 
allow people to describe their health state across five di- 
mensions.!° There are three response categories that com- 
bined total 243 possible health states.'° Preference-based 
weighting systems based on time trade-off studies in an- 
other sample are applied mathematically to the health 
state. The preference weight allows a single numeric score 
from slightly less than zero (theoretically worse than 
death) to one (best health state). EQ-5D scores are used 
in economic appraisals (such as cost-utility analyses) in 
the construction of quality-adjusted life-years (QALYs) 
for the calculation of cost per QALY gained and its com- 
parison across interventions. The EQ-5D is simple to use. 
Notable is its use in persons with musculoskeletal condi- 
tions.'' Information on the EQ-5D is available at www. 
euroqol.org. 


Health Utilities Index 


The Health Utilities Index (HUI) is a health status and qual- 
ity of life assessment instrument developed as an indirect 
method of measuring utilities (preferences) in clinical 
trials and other studies.!?-'> There has been multiple relia- 
bility and validity testing done on the HUI; it has proven to 
be comprehensive, reliable, responsive, and valid.!?-!5 To 
determine attribute levels, responses to the questionnaire 
are converted using standard algorithms from the Health 
Utilities Index Mark 2 (HUI2) and Mark 3 (HUI3) multiat- 
tribute health status classification systems. These attribute 
levels are combined with published scoring functions!” 
to calculate utility scores of overall HRQL. 

The HUI2 and HUI3 health status classification systems 
are complementary. Together they provide descriptive 
measures of ability or disability for health-state attributes, 
and descriptions of comprehensive health status.'° The 
HUI2 is composed of seven attributes or dimensions 
and the HUI3 is composed of eight attributes (Table 
18.1).!2-> For overall health status, the HUI2 and HUI3 uti- 


Table 18.1 Dimensions of the Health Utilities Indexes — 
HUI2 and HUI3 


HUI2 HUI3 

e Sensation e Vision 

e Mobility e Hearing 

e Emotion e Speech 

e Cognition e Ambulation 
e Self-care e Dexterity 

e Pain e Emotion 

e Fertility e Cognition 


e Pain 


Source: Data from Torrance GW, Feeny DH, Furlong WJ, Barr RD, 
Zhang Y, Wang Q. Multiattribute utility function for a compre- 
hensive health status classification system: Health Utilities Index 
Mark 2. Med Care. 1996;34:702-722 and Furlong WJ, Feeny DH, 
Torrance GW, Barr RD. The Health Utilities Index (HUI) system for 
assessing health-related quality of life in clinical studies. Ann Med 
2001;33:375-384. 
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lity scales of HRQL are defined such that dead = 0.00 and 
perfect health = 1.00. The HUI2 describes 24,000 unique 
health states and the HUI3 describes 972,000 unique 
health states that are obtained from factorials of the num- 
ber of levels in each attribute.!?-" Utilities derived from re- 
sponses to HUI questionnaires may be used to calculate 
QALYs. There is a fee associated with using the HUI2 and 
HUI3; additional information is available at www.fhs. 
memaster.ca/hug. 


Quality of Well-Being Scale 


The Quality of Well-Being Scale (QWB) is an interviewer- 
administered instrument that measures well-being in in- 
dividuals based on the social preferences that society gen- 


Conclusions 


When selecting a generic measure, it is important to look 
at the objectives of your research initiative to ensure that 
the HRQL instrument will meet them. Potential instru- 
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erally associates with a person's level of functioning at a 
specific point in time.'® For each individual, the scale 
averages values across three ratings of functioning: mobi- 
lity, physical activity, social activity, and across one rating 
of symptomatic complaints that might inhibit function.'® 
Based upon the interview, a single functional level is as- 
signed to each respondent within each of the four domains, 
and each level is given a weight based on large population 
surveys of preferences of each functional level.!° The QWB 
scoring allows placement of each individual on a conti- 
nuum of wellness ranging from O (for dead) to 1.0 for 
asymptomatic full function." Information on the QWB is 
available at www.outcomes-trust.org. 


ments should be researched and their validity, reliability, 
and responsiveness to change checked. 


wo 
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Common Disease-Specific Outcome Scales for Surgeons 


“In the name of certainty, the greatest crimes have been committed against humanity.” 


Summary 


Multiple disease-specific or condition-specific health-re- 
lated quality of life (HRQL) measures have been developed 
for orthopaedic and surgical conditions. Some instruments 
are widely used, and have proven to be valid, reliable, and 


Introduction 


In recent years, the orthopaedic research community has 
committed a significant amount of time, energy, and re- 
sources to developing patient-based measures of HRQL as 
they move to understand the value in describing the bur- 
den of a disorder and predicting future outcomes. Typi- 
cally, taxonomy divides into disease- or condition-specific 
measures after generic measures. Beaton and Schemitsch! 
categorized disease- or condition-specific measures 
further into three categories: (1) regional measures, (2) 
joint/disease-specific measures, and (3) patient-specific 
measures. All three categories will be expanded upon in 
this chapter and examples and descriptions of commonly 
used scales are provided. 

A recent supplement of the Journal of Orthopaedic 
Trauma provided an excellent guide to the currently avail- 
able instruments for measuring outcomes in patients with 


— Carlos Fuentes 


sensitive to change. In this chapter, several disease- or con- 
dition-specific measures commonly used in orthopaedic 
research are reviewed 


musculoskeletal disease.” The instruments presented in 
this supplement were identified by reviewing all abstracts 
from the Orthopaedic Trauma Association (OTA) and 
American Academy of Orthopaedic Surgeons (AAOS) 
trauma section meetings from 1997 to 2002.” Table 19.1 
lists the disease-specific instruments that are described 
in this supplement, as well as the area that they were de- 
signed to assess. The supplement provides information 
on the name of the instrument, what it is designed to as- 
sess, method of administration, how to obtain the instru- 
ment, costs, method of design, statistical validation, nor- 
mative data available, disease-specific data available, refer- 
ences, how the instrument is scored, format and number of 
questions, time for administration, and a copy of the in- 
strument. 


Table 19.1 Disease-Specific Instruments Used in Orthopaedic Trauma Research 


Instrument 


Hannover Polytrauma Score 


Area It Is Designed to Assess 


Severity of overall injury 





Ankle Osteoarthritis Scale 


Ankle osteoarthritis 





Clinical Grade Scale-d’Aubigne and Postel 


Hip function 





Creighton—Nebraska 


Calcaneus fractures 





Harris Hip Score 


Patients with traumatic arthritis of the hip 





Hip Rating Questionnaire 


Outcome after total hip replacement 





Hospital for Joint Diseases Hip Fracture Recovery Score 


Functional recovery for ambulatory hip fracture patients 





lowa Calcaneal Score 


Calcaneus 





Kaikkonen Ankle Scale 


Functional recovery after ankle injuries 





Knee Function—Rasmussen 


Knee 





Knee Injury and Osteoarthritis Outcome Score (KOOS) 


Short- and long-term patient relevant outcomes after knee injury 





Knee Society Clinical Rating System 


Knee function and patient function after total knee arthroplasty 


(Continued) 
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Table 19.1 Disease-Specific Instruments Used in Orthopaedic Trauma Research. (Continued) 


Instrument 


Lower Extremity Measure 


Area It Is Designed to Assess 


Physical function 





Majeed Score 


Function after pelvic injury 





Maryland Foot Score 


Mazur Ankle Evaluation Grading System 


Foot injuries 


Ankle function 





Modified Hospital for Special Surgery Knee Scoring System 


Knee function 





Olerud and Molander Scoring System 


Symptoms after ankle fracture 





Orthpadische Arbeitsgruppe Knie (OAK) 


Functional knee stability 





Ovadia and Beals 


Tibial plafond outcome 





Oxford Knee Questionnaire 


Functional outcome 





Western Ontario McMaster Arthritis Index (WOMAC) 


Knee and hip osteoarthritis 





Schatzker and Lambert 


Supracondylar femur fractures 





Thompson Epstein Scale 


Outcome after traumatic hip dislocation 





VAS for Calcaneal Fractures 


Calcaneus 





American Shoulder and Elbow Surgeons (ASES) Assessment Form 


Shoulder and elbow 





Clinical Evaluation Scoring 


Upper extremity function 





Constant-Murley Clinical Method of Functional Assessment of the 
Shoulder 


Shoulder function 





Disabilities of the Arm, Shoulder and Hand (DASH) 


Disability experienced by people with upper limb disorders 





Elbow Evaluation 


Elbow function 





Gartland and Werley 


Colles’ fractures 





Hospital for Special Surgery Shoulder Score 


Shoulder 





Mayo Elbow Performance Score 


Elbow function 





Neer Score 


Proximal humerus fracture 





Oxford Shoulder Score (OSS) 


Rowe Rating Sheet 


Outcome of shoulder surgery 


Shoulder function 





Shoulder Pain and Disability Index (SPADI) 


Pain and disability in shoulder 





Shoulder Rating Questionnaire 


Severity of symptoms in shoulder 





Subjective Shoulder Rating System 
UCLA Shoulder Rating Scale 


Subjective shoulder complaints 


Shoulder 





Western Ontario Rotator Cuff Index (WORC) 


Quality of life for patients with rotator cuff disease 





Posttraumatic Stress Disorder 


Posttraumatic stress 





Social Function Score 


Dependence on social welfare system 





Stanmore Functional Rating 


Pain and level of activity 


Source: Data from Swiontkowski M, Agel J. Guide to currently available instruments for measuring outcomes in patients with 
musculoskeletal disease. | Orthop Trauma 2006; 20(8, Suppl):S65-S146. 
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Regional Health-Related Quality of Life Measures 


Regional measures are designed to capture more than one 
joint, area, or disorder. For example, some measures are 
designed to measure any disorder in a limb, assuming 
that at a level of disability the impact will not vary signifi- 
cantly, whether it is the ankle or a knee that is affected.' 


Jargon Simplified: Regional Measures 
Regional measures are designed to capture more than 
one joint, area, or disorder.! 


In the upper limb, there are outcome measures such as the 
Disabilities of the Arm, Shoulder, and Hand (DASH). The 
DASH Outcome Measure is a 30-item, self-report question- 
naire designed to measure physical function and symp- 
toms in patients with one or several musculoskeletal disor- 
ders of the upper limb.’ The questionnaire was designed to 
help describe the disability experienced by people with 
upper-limb disorders and to monitor changes in symptoms 
and function over time. Testing has shown that the DASH 
performs well in both these roles.? It is designed to be sen- 
sitive to disability in the hand as well as in the shoulder.! 
The DASH has been shown to be reliable, valid, and respon- 
sive to change in multiple studies.! The DASH is currently 
available in 17 languages. It can be downloaded without 
charge from http://www.dash.iwh.on.ca. 

In the lower extremities, there are also regional mea- 
sures such as the Toronto Extremity Salvage Score 
(TESS).*° The TESS is a 30-item scale designed to measure 
physical function in patients undergoing limb salvage for 
extremity sarcoma.*° The TESS has a demonstrated relia- 
bility, validity, and responsiveness to change in a cancer 
population.*° The TESS is widely used in the musculoske- 


letal oncology field and in limb salvage.' The TESS is avail- 
able in several languages and may be acquired from its de- 
veloper, Dr. Aileen Davis, Toronto Rehabilitation Institute, 
Toronto, Canada! 

The Short Musculoskeletal Functional Assessment 
(SMFA) is the broadest regional measure: it is designed 
for use across any musculoskeletal condition, in the upper 
or lower extremity.° This measure would fit into a category 
of its own; however, previous reviews have defined it as a 
regional measure.' The SMFA is a 46-item questionnaire 
that is a shortened version of Swionkowski’s full Musculos- 
keletal Functional Assessment.®” The SMFA is comprised of 
two main scores: the function index and the bothersome 
index. The functional index is subdivided into four sub- 
scales: daily activities, emotional status, arm and hand 
function, and mobility. The SMFA has been tested in pa- 
tients with musculoskeletal disorders, as this is the target 
population. The SMFA was designed to describe the various 
levels of function in people with musculoskeletal disor- 
ders, as well as monitor change over time. It can be down- 
loaded without charge from http://www.ortho.umn.edu/ 
ortho/research.html. 

Regional measures offer a practical alternative to joint-/ 
disease-specific measures in that they can afford “one 
measure for all” in a busy, mixed orthopaedic practice as 
long as their performance is shown to be adequate in 
each patient group.' Often, direct comparisons will show 
that the regional measures have much the same content 
as more specific ones. However, there are some circum- 
stances where more-specific measures are needed, which 
would mandate the selection of a joint- or disease-specific 
measure.! 


Joint-/Disease-Specific Health-Related Quality of Life Measures 


Joint- or disease-specific measures are by far the most 
common in terms of numbers of measures available; one 
could even argue that there are perhaps too many preclud- 
ing comparisons across studies.' For example, in the 
shoulder region there are at least 13 shoulder-specific 
questionnaires.’ The primary benefit of joint-/disease-spe- 
cific measures is their ability to detect focused types of 
changes.! 

An example of a joint-/disease-specific measure is the 
Western Ontario McMaster Osteoarthritis Index (WO- 
MAC). The WOMAC is self-administered and assesses the 
three dimensions of pain, disability, and joint stiffness in 


knee and hip osteoarthritis using a battery of 24 ques- 
tions. It is a valid, reliable, and responsive measure of out- 
come, and has been used in several studies involving frac 
ture patients.'° The most commonly used response scale is 
a 5-point Likert scale; however, there is a visual analogue 
scale version. The WOMAC is the most commonly used 
and endorsed patient-based outcome after hip or knee ar- 
throplasty and has been widely used and tested in the field 
of osteoarthritis and rheumatoid arthritis.! The WOMAC is 
available through its developer, Dr. Nicholas Bellamy, visit 
www.auscan.org. 
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Patient-Specific Health-Related Quality of Life Measures 


Patient-specific measures are individualized to the extent 
that patients are able to choose their own items; generally, 
there are approximately five items that reflect symptoms 
and daily activities! Beaton and Schemitsch identified 
eight patient-specific measures in the literature. 

Because of the individualized nature of each patient- 
specific measure, each instrument is potentially useful 
for orthopaedic patients.’ The domains included in each 
patient-specific measure are different. For example, the 
Canadian Occupational Performance Measure (COPM) 
looks at self-care, productivity, and leisure based on their 
importance.'! The Patient-Specific Index (PASI) looks at 
two symptoms and three daily activities that are chosen 
by their level of difficulty,'? which could result in slightly 
different concepts being elicited for the content of the 
measure.' Patient-specific measures have been found to 
be very useful in clinical practice, where they can aid in 
problem identification and monitoring.! Although they 
generally take longer to complete, the time is often well 
spent for the information that is obtained. There is some 
uncertainty about whether it is valid to group scores into 
an average across many patients because of the variability 
in the content of the different questionnaires.' For exam- 
ple, two people could have the same numeric score on a 
patient-specific measure, but one has chosen very light ac 


Conclusions 


Many disease-specific or condition-specific HRQL instru- 
ments have been developed for orthopaedic and surgical 
conditions. Some widely used instruments have proven 
to be valid, reliable, and sensitive to change. Other instru- 
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Common Ways to Present Treatment Effects 


“If you want to inspire confidence, give plenty of statistics. It does not matter that they 
should be accurate, or even intelligible, as long as there is enough of them.” 


Summary 


The different ways in which the results of randomized con- 
trolled trials may be presented are discussed in this chap- 
ter. Most patient treatments have some form of effect and 


Introduction 


As clinicians, we look to the literature to help guide us in 
the management of our patients. In assessing an article, 
we look at the methodology, the study population, and 
so on, but our real interest is in the results. These results 
can influence out treatment decisions, both positively 
and negatively, depending, in part, on how they are pre- 
sented to us. We need to understand what the results 
mean and how to apply them to our patients. Then, in ap- 
plying results to patients, we use them to help guide treat- 
ment. From the study, we have also learned the risks and 


Data Presentation 


Data can be presented in many different ways; however, 
data can be fundamentally grouped into two categories: 
probabilities (given generally as proportions) and natural 
frequencies (essentially “numbers”).! 


Jargon Simplified: Probability 
Probability is “the proportion of times an event would 
occur in a large number of similar repeated trials.”” 


Probabilities 


Gigerenzer and colleagues! have explored how we inter- 
pret results of trials and categorize probability into sin- 
gle-event probabilities, conditional probabilities, and rela- 
tive risk (RR). 


Key Concepts: Hypothetical Examples 

Single-Event Probability: “There is a 2% chance of having 
a wound infection after this procedure.” 

Conditional Probability: “There is a 90% chance of having 
a nonunion if your bone scan is positive.” 


— Lewis Carroll 


the magnitude of this effect can be presented in many 
ways, some more clinically useful than others. 


benefits of the treatment presented, which we share 
with the patient to help him or her make a truly informed 
decision. Unfortunately, the way results are presented may 
cause confusion and difficulties in interpretation with re- 
spect to their clinical applicability. Treatment effects may 
be magnified or reduced depending on how or what is re- 
ported. Data presentation can get lost in jargon and num- 
bers; however, treatment decisions may be based on this 
so an understanding of various data presentation methods 
is pivotal to patient care. 


Single-Event Probability 


A single-event probability is exemplified in the statement, 
“With this procedure there is an 80% chance it will remove 
your pain.” As Gigerenzer writes, there is ample room for 
miscommunication with this type of statement. Patients 
may construe something other than what is meant because 
there is no real point of reference for them.! Does this mean 
that 80% of the time they will not have a surgical problem? 
Or that the surgery will remove 80% of the pain? Or, “there 
is an 80% chance I will be completely pain-free?” One can 
see how there are many different interpretations of this 
statement leaving room for confusion.'! Thus, there may 
be discordance as to what we believe we are saying as phy- 
sicians and what patients are hearing and understanding. 


Conditional Probability 


Conditional probabilities are based on statements such as 
“if X happens then your chance of having Y is.” Consider 
the results of a meta-analysis of diagnostic studies on mak- 
ing the diagnosis of anterior cruciate ligament (ACL) injury 
from the clinical exam. The authors state that “The pooled 


122 


www.urdukutabkhanapk.blogspot.com 


llC Understanding Treatment Effects 


sensitivity for the Lachman test was 85%.”? This can be ex- 


pressed as a conditional probability as follows: “If someone 
has an ACL deficient knee, they will have a positive Lach- 
man test 85% of the time.” This can very easily be confused 
with, “If I elicit a positive Lachman on clinical exam, then 
there is an ACL tear 85% of the time,” which means some- 
thing completely different (it is actually the positive pre- 
dictive value that is different from the sensitivity). Some- 
times, this is difficult to distinguish between these two 
conditions in busy clinics, when teaching trainees or med- 
ical students, or discussing tests with patients. When doc- 
tors were posed a question with a problem to solve given 
conditional probabilities, there was a significant inability 
to provide the correct answer (incidentally, when this 
same problem was given as a natural frequency, the num- 
ber of correct answers was significantly more).* Thus, 
there may be confusion as to how to interpret probabilities 
of this type. 


Relative Risk 


The probability commonly used when presenting the re- 
sults of a trial (generally, a randomized controlled trial) is 
the relative risk (RR), which is, in fact, a proportion of pro- 
portions. RR is generally given as the risk of an event in 
treatment group A/the risk of an event in treatment group 
B, or if using the terms control versus experimental groups, 
the incidence of an event in the experimental group/the in- 
cidence of an event in the control group. 


Jargon Simplified: Event 

An event is any time a predefined outcome of interest oc- 
curs. Events are used to measure the effectiveness of a 
treatment. Investigators should distinguish between 
primary events and secondary events. A primary event 
is the outcome variable that is designated as key in the 
design and the analysis of the results of a study. Second- 
ary events are variables that are also useful in the evalua- 
tion of a treatment effect. The types of events recorded 
are dependent on the trial and what is being measured. 


Jargon Simplified: Relative Risk 
Relative Risk: Experimental Event Rate (EER)/Control 
Event Rate (CER) 


An example of RR is given in Table 20.1. The risk of an event 
in the experimental group is 10/100, which is 10%. The risk 
of an event in the control group is 20/100, which is 20%. 
Therefore, the RR of having an event in the experimental 


Table 20.1 An Example of Relative Risk 





group as compared with the control group is 10%/20%, 
which is 0.5. This means that someone is half as likely to 
have had an event (or outcome of interest) in the treatment 
group as compared with the control group. RR can there- 
fore be >1 or <1 and when =1 demonstrates no difference 
between the groups. That is, when the RR is 1, the experi- 
mental event rate (EER) is equal to the (control event rate) 
CER; therefore, there is no treatment effect. Similarly, if the 
RR is 2, the treatment results in a twofold increased risk of 
experiencing an event. By convention, numbers <1 are 
usually expressed as (1 - RR) or the relative risk reduction. 
That is, if the EER/CER = 0.5 as above, there is a 50% (1 - 0.5 
= 0.5, which in turn is 50%) reduction in the risk of an 
event.” 

Although this is the most common way of expressing 
treatment results, it does have its down side. The main pro- 
blem in discussing measures of effect in relative terms is 
that it does not take into account the overall baseline risk 
of the patient population nor does it take into account 
the rate of actual events. If the patient’s baseline risk for 
an event is low then a 50% reduction in the risk of this 
event with a particular treatment may mean nothing at 
all, whereas if a patient has a very high baseline risk than 
a 50% reduction may be very significant. For instance, if 
we quote a hypothetical risk reduction in surgical site in- 
fections of 50% for treatment A as compared with treat- 
ment B, this makes treatment A sound quite good. How- 
ever, in young healthy patients the risk of infection may 
only be 1% and in elderly diabetic patients it may be 10%. 
Clearly, we would be more swayed to choose treatment A 
(all other things being equal) over treatment B for dia- 
betics. For the young healthy group, the benefit would be 
negligible as a 50% reduction in risk from 1% to 0.5% may 
be clinically irrelevant. In addition, if the risk of the event 
itself is very low, for example 1% then reducing this by 
50%, which is potentially a significant treatment effect 
may in fact also be clinically irrelevant. This is illustrated 
further in Chapter 24. 


Odds Ratios 


There is a significant distinction between the odds ratio 
(OR) and relative risk. RR as explained above is a ratio of 
probabilities whereas the odds ratio is, of course, a ratio 
of the odds. It is again helpful to look at our table. This 
time we will change the experimental group to treatment 
A and the control group to treatment B, which is common 
in orthopaedic surgical trials. 

If we look at Table 20.2, the odds of having an event ver- 
sus not having an event with treatment A is 10/90 or 1in9 


Table 20.2 An Example of the Odds Ratio 











Event No Event 
Event No Event 
Experimental (or Treatment) 10 90 ASN sala 
Group (N = 100) Treatment A (N = 100) 10 90 
Control Group (N = 100) 20 80 Treatment B (N = 100) 20 80 
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against having the event. However, the risk or probability 
of having an event with treatment A is 10/100 or 1 in 10 
(10%). We can do the same for treatment B, the odds are 
20/80 or 1 in 4 against having an event and the risk of ex- 
periencing the event is 20%. Therefore, one can see if ex- 
pressed as a ratio, the OR is (1/9)/(14), which is 0.44 and 
the RR is 10%/20%, which is 0.5. One can see that as the 
number of events gets smaller, the OR approximates the 
RR (this is because the denominator used in calculating 
odds gets closer to the total number in the sample our de- 
nominator used to calculate risk). So, what is the difference 
when discussing the results? One of the main differences is 
that RRs are more intuitively understood by clinicians and 
patients. Another is that in trials with high event rates, the 
OR can seemingly overstate the differences between treat- 
ment groups.° In this case, researchers suggest that using 
the OR be limited to study designs such as the case-control 
study for two reasons: (1) the underlying event rate is low, 
and (2) the true incidence of the disease being measured is 
not known (that is, we don’t actually know the denomina- 
tor to calculate RR).”® 

The problem with discussing results in terms of prob- 
abilities, proportions, or ratios of odds is that people 
(both clinicians and patients) can interpret their meaning 
differently. This is because, as stated previously, they do 
not necessarily take into consideration the baseline risk 
of the individual patient to whom the results are to be ap- 
plied.! 


Natural Frequencies 


Gigerenzer argues that if probabilities are expressed as 
natural frequencies, which are essentially whole numbers, 
it is much easier to understand as it places a frame of refer- 
ence on the numbers. For instance, if we take the single- 
event probability above, “with this procedure there is an 
80% chance it will remove your pain” and convert it to a 
natural frequency we would say that for every 10 people 
operated on, 8 patients will have no pain after the proce- 
dure. This places a frame of reference on the numbers, 


Putting It All Together 


Let us take a hypothetical example of two studies and list 
the ways the results may be interpreted (Table 20.3). 
This will serve to illustrate the pros and cons of how they 
have been presented. 

From Table 20.4, it is thus easy to see the problem clini- 
cally when dealing with results in relative terms. For both 
trials the RRs and RR reductions are the same. That is there 
is a50% reduction in the risk of having an event in the treat- 
ment group as compared with the control group. However, 
this is in the face of a 10-fold difference in event rates. In 
trial 1, most would feel that this is a clinically irrelevant dif- 
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which possibly makes it is easier to understand.' This 
brings up the concept of the absolute risk difference and 
the number needed to treat (NNT). 


Absolute Risk and Number Needed to Treat 


The absolute risk difference is still given as a percentage a 
proportion (i.e., most commonly) for ease of use. It is the 
proportion of events in the experimental group minus 
the proportion of events in the control group. 


Jargon Simplified: Absolute Risk Reduction 
Absolute Risk Reduction (ARR): |Experimental Event 
Rate (EER) - Control Event Rate (CER) | 


According to Table 20.1 and Table 20.2, this would be | 10% 
- 20%|, which equals 10%. This number, however, is di- 
mensionless and lacks significant clinical utility.’ Its in- 
verse, the number needed to treat, places a fixed number 
on the results and turns abstract probabilities into a 
more clear clinical picture. 


Jargon Simplified: Number Needed to Treat 
Number Needed to Treat (NNT): 1/Absolute Risk Reduc- 
tion (ARR) 


For instance, in a meta-analysis of trials comparing the 
nailing of humeral shaft fractures with the plating of 
such fractures, the RR reduction of having a reoperation 
was 74% in favor of plating.’ That is to say in the treatment 
of humeral shaft fractures, there was a 74% decreased 
chance of having to undergo a reoperation when using a 
plate as compared with using a nail. The corresponding ab- 
solute risk reduction was 10% between the treatment 
groups. When this is converted to the number needed to 
treat, we see that 1/0.1 is equal to 10. Therefore, for every 
10 patients treated with plates, one reoperation could be 
avoided (this far more clearly “hits home”). It gives treating 
surgeons a concrete number to work with and is arguably 
much easier for the clinician to understand and subse- 
quently explain to patients.® 


Table 20.3 A Sample Comparison of Treatment Results 
from Two Study Groups 











Experimental Control 

Group Group 
Number of Patients 100 100 
Number of Events: Trial 1 | 1 (1%) 2 (2%) 
Number of Events: Trial 2 | 10 (10%) 20 (20%) 
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Table 20.4 A Sample Comparison of Relative Risks from 
the Two Study Groups 


Experimental vs. Control Trial 1 Trial 2 


Relative risk: EER/CER 1%/2%=0.5 10%/20%=0.5 











Relative risk reduction: 1 — RR 50% 50% 
Absolute risk reduction: 1% 10% 
| EER-CER | 

Number needed to treat: 1/ARR | 100 10 


Abbreviations: ARR, absolute risk reduction; CER, control event 


rate; EER, experimental event rate. 


ference. That is to say, reducing someone’s risk from 2% to 
1% may not be clinically relevant. This depends, in part, on 
what the event is, if it is mortality then there may be some 
clinical relevance. The results are strengthened and made 
clear, however, by looking at the number of patients 
needed to treat. In trial 1, for every 100 patients treated, 
one event could be prevented. In trial 2, for every 10 pa- 
tients treated, one event is prevented. This allows for easier 
information translation when teaching trainees and when 
talking to patients as well. It also promotes our under- 
standing of the treatment effects on the patients, which, 
in turn, facilitates discussions of the economics and risk/ 
benefit ratios surrounding different treatments.!! 


Examples from the Literature: Example of How to In- 

terpret Trial Results 

Source: Canadian Orthopaedic Trauma Society. Non-un- 
ion following intramedullary nailing of the femur with 
and without reaming. Results of a multicenter rando- 
mized clinical trial. J Bone Joint Surg Am 2004; 85-A 
(11):2093-2096.!? 

Background: Intramedullary nailing of the femur with- 
out reaming of the medullary canal has been advocated 
as a method to reduce marrow embolization to the lungs 
and the rate of infection after open fractures. The use of 
nailing without reaming, however, has been associated 
with lower rates of fracture-healing. The purpose of 
this prospective study was to compare the rate of union 
of femoral shaft fractures following intramedullary nail- 
ing with and without reaming. 

Methods: Two hundred and twenty-four patients were 
enrolled in a multicenter, prospective, randomized clin- 


ical trial to compare nailing without reaming and nailing 
with reaming. One hundred and six patients with 107 fe- 
moral shaft fractures were treated with a smaller dia- 
meter nail without reaming of the canal, and 118 pa- 
tients with 121 fractures had reaming of the canal and 
insertion of a relatively larger diameter nail. Patients 
were followed at 6-week intervals until union occurred 
or a nonunion was diagnosed. 

Results: The two groups were comparable with regard 
to the measured patient and injury characteristics. Eight 
(7.5%) of the 107 fractures in the group without reaming 
had a nonunion compared with two (1.7%) of 121 frac- 
tures in the group with reaming (P = 0.049). The relative 
risk of nonunion was 4.5 times greater (95% confidence 
interval = 1 to 20) without reaming and with use of a 
relatively small-diameter nail. 

Conclusion: Intramedullary nailing of femoral shaft frac 
tures without reaming results in a significantly higher 
rate of nonunion compared with intramedullary nailing 
with reaming. 


According to the results of this study,'* for every 17 pa- 
tients treated with a reamed femoral nail, one nonunion 
can be prevented (Table 20.5 and Table 20.6). 

Once we understand how to interpret the results of a 
trial, we can then assess all events that occur in the trial. 
For example, were there any complications? If so, what 
was the rate? Conversely, what is the number needed to 
harm? Are there any complications with this procedure? 
Once these results are expressed clearly, then we can 
more easily see what the risks and benefits of a treatment 
are. 


Table 20.5 Event Rate of Nonunion of Femoral Shaft 
Fractures following Intramedullary Nailing with and 
without Reaming 


Femoral Nail Femoral Nail 
with Reaming without Reaming 


Event Rate (an event in 2/121 (1.7%) 8/107 (7.5%) 


this case is a nonunion) 





Source: Data from Canadian Orthopaedic Trauma Society. Non- 
union following intramedullary nailing of the femur with and 
without reaming. Results of a multicenter randomized clinical 
trial. | Bone Joint Surg Am 2004; 85-A(11):2093-2096. 
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Table 20.6 Relative Risk of Nonunion of Femoral Shaft Fractures following Intramedullary Nailing 


with and without Reaming 


Relative risk of nonunion: 
Reaming vs. no reaming 


1.7%/7.5% = 0.23 








or 
23% 

Relative risk reduction of nonunion with reaming compared with no reaming 1-0.23 = 0.77 

(i.e., we reduce the risk of getting a nonunion by 77% when reaming the or 

femoral canal prior to nail insertion) 77% 

Absolute risk reduction |.017 - 0.75 | = .058 (5.8%) 

Number needed to treat -17 


(i.e., the number of patients that need to be treated with a reamed 
femoral nail to prevent one nonunion) 


Source: Data from Canadian Orthopaedic Trauma Society. Non-union following intramedullary nailing of the femur with and without 
reaming. Results of a multicenter randomized clinical trial. | Bone Joint Surg Am 2004; 85-A(11):2093-2096. 


A Brief Word on Discussing Results with Patients 


Now that we have discussed how trials can present results, 
it is important to remember that in discussing the risks and 
benefits of procedures with patients we need to take into 
account their own values and preferences. It may be con- 
troversial to discuss, but it is generally acknowledged 
that physicians and surgeons have the ability to sway pa- 
tients toward one treatment or another and this is depen- 
dent, in part, on how we present the risks and benefits to 
them. What is important to our patient? Is it time off 
work, the healing rate, or the risk of death? Clearly, we 


Conclusions 


It is important to understand how data from a study may 
be presented. The presentation not only affects our percep- 
tion of the results, but also affects our ability to translate 
the results to patient care. Expressing data in more easily 
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must be able to explain things in a manner that people 
can understand. Arguably, this may be in the form of nat- 
ural frequencies, the easiest way being the NNT. Indeed, 
the emphasis has been shifting to the patient more and 
more in clinical decision making.'? We must be able to ex- 
press outcomes in a way that patients can understand; we 
need to strive for clinical evidence focusing on patient- 
oriented outcomes to help our patients and us with these 
decisions. 


understood ways, such as using the number needed to 
treat rather than as potentially confusing probabilities, 
may help busy surgeons to make important treatment de- 
cisions. 
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The Confidence Interval Defined 


“It is one thing to show a man that he is in an error, and another to put him in possession of 


the truth.” 


Summary 


In Chapter 20, the different ways used to discuss treatment 
effects including absolute risk reduction, relative risk, and 
number needed to treat were introduced. When we calcu- 
late a value for one of these based on study data, we are 
performing a type of statistical analysis known as estima- 
tion. Estimation involves attempting to quantify a clinical 
effect of interest by examining a sample of the population 


Introduction 


In all clinical research, regardless of the study type being 
employed, the group of patients involved represents only 
a fraction, or sample, of the total population of interest. 
Therefore, the results of measurements in the study, such 
as time to radiographic union, postoperative infection 
rate, or average score on a health outcome measure, is 
hoped to reflect and potentially approximate the true va- 
lue of the outcome in the entire population of interest. 
Thus, the study data provides an estimate (or point esti- 
mate) of the “truth”: that is the true value of measuring 
the outcome in the population. 

Because the group of patients enrolled in a clinical study 
will never include all members of the target population, we 
can never know this true value of the study outcome of in- 
terest; therefore, our estimate will always involve a certain 


The Confidence Interval 


Cls, like the P value, are used to quantify the uncertainty in 
a Statistical estimate.' Unlike the P value, a CI provides us 
with a vital piece of information that allows us not only 
to determine statistical significance, but also to overcome 
some limitations with the use of the P value. It gives us a 
range of values in which the true population value for a sta- 
tistical parameter is likely to lie. 


Jargon Simplified: Confidence Interval 

The confidence interval is “a range of values around the 
sample estimate within which there is reasonable confi- 
dence that the true, but unknown, population value 
lies.”? 


— John Locke 


and describing the uncertainty in this estimate with hy- 
pothesis testing or by constructing a confidence interval 
(CI). In this chapter, the definition and application of CIs 
are presented, some of the advantages of CIs over hypoth- 
esis testing are highlighted, and how CIs can be used to 
help make clinical decisions is discussed. 


level of uncertainty. When describing this uncertainty, or 
how precise the estimate is, researchers rely on two related, 
but different statistical measures, the P value and the con- 
fidence interval. The P value is discussed in Chapter 22. 


Jargon Simplified: Precision and Accuracy 

Precision refers to the ability of repeated measurements 
or experiments to produce a similar result. 

Accuracy refers to the proximity of those measurements 
to the true population value. 


Key Concepts: Common Measures of Statistical 
Precision 

e P value 

e Confidence interval 


The definition provided above by Doll and Carney? intro- 
duces the following question: What is a reasonable level 
of confidence? As with hypothesis testing and the P value, 
there is a conventional level of confidence. The level that is 
used is a 95% confidence interval. This is the range of values 
for the measurements of a study outcome in which we are 
95% confident that the true population value lies. That is, if 
we were to conduct the same study 100 times, each study 
would result in a different point estimate and these esti- 
mates would be spread around the true, but unknown va- 
lue. In other words, 95 times out of 100 the spread of these 
values would contain the true estimate of the treatment ef- 
fect.’ The numbers on either side of the point estimate that 


127 


128 


www.urdukutabkhanapk.blogspot.com 


Il1C Understanding Treatment Effects 


represent the maximal spread of the CI are known as the 
confidence limits. Many clinical studies use (arbitrarily) 
the 95% CI. This happens to correlate to a 5% chance that 
the true value lies outside of the 95% CI, which, in turn, cor- 
relates to a Pvalue of 0.05 (that is a5% chance of concluding 
there is a difference when, in fact, there is not). Thus, if we 
have a point estimate of a treatment effect for group A and 
another for group B and if the Cls overlap, then statistically 
these groups are not significantly different. However, if the 
95% CIs do not overlap, we can conclude (with a P value 
<0.05) that the groups are statistically different. 


Confidence Intervals for Different Statistical 
Parameters 


The statistical parameter(s) of interest will change de- 
pending on the type of study being employed and the clin- 
ical question being asked. For example, an observational 
case-control study may report the results as ORs, whereas 
a randomized control trial will most commonly report the 
relative risk (RR), the absolute RR, or the number needed to 
treat (NNT). It may also discuss differences in means. Re- 
gardless of the statistic being used, a CI can be calculated. 


Factors Affecting the Width of a Confidence 
Interval 


As stated earlier, CIs are used to measure imprecision or 
uncertainty in study estimates. The level of precision in 
the sample estimate is reflected by the width of the CI, 
that is, the spread of numbers both above and below the 
point estimate. Intuitively this makes sense; as the preci- 
sion in a study gets higher, the value of the sample statistic 
will get closer to the true population value, so the range of 
values in which we have reasonable confidence that the 
population value lies gets narrower. How is it then, that 
we can increase the precision of our study estimate, nar- 
row the CI, and improve the accuracy of our study results? 
Three main variables influence the width of the CI: (1) the 
level of variability within the sample population, (2) the le- 
vel of confidence required in the study, and (3) the sample 
size. 

The variability within the sample population is reflected 
in the variance or standard error of the estimate. Because 
all individuals recruited into clinical studies are different, 
variability can never be eliminated and is often out of the 
control of the researcher. However, selecting a sample of 
participants that is more homogeneous or improving the 
reliability of measurements on which our sample data 
are based can reduce variability and lead to a narrower 
CI? Caution must be exercised if selection of study partici- 
pants is altered to decrease variability, as this may affect 
our ability to apply the study results to the population at 
large (that is the generalizability of the study) and could 
limit the use of study data to make clinical decisions. 


A second way to narrow the CI is to decrease the level of 
confidence we have that the true population value lies in 
our interval (e.g., from a 95% CI to 90%). Again, the relation- 
ship between level of confidence and interval width is in- 
tuitive. To be 95% confident that our interval includes the 
true population value of our statistic, the CI must be a cer- 
tain width. If we then wish to increase our level of confi- 
dence to 99%, the width of the CI must increase to include 
more values that might represent the true population va- 
lue of the statistic of interest. If we want a narrow CI, we 
may consider dropping our required level of confidence 
to 90%. Although doing so may still allow us to determine 
clinical significance from our CI, we must remember 
when using the CI to test for statistical significance that a 
90% CI corresponds to a P value of 0.10, which may not 
be low enough to make meaningful conclusions about 
the study. 

The width of a CI can also be increased or decreased by 
changing the sample size of the study. As the CI is a mea- 
sure of precision in an estimate that is calculated from 
data collected from a sample population, it makes sense 
that as the size of the sample population changes to in- 
clude fewer or more people from our target population, 
the level of precision and therefore the width of the CI 
will also change. As the sample size increases, the precision 
in our estimate increases and the width of the CI decreases. 
Conversely, if the number of participants in a study is de- 
creased, our estimate becomes less precise and the width 
of the Cl increases to maintain the same level of confidence. 

One important point to remember regarding Cls is that 
for the CI to be a useful statistical tool, the point estimate 
on which it is based must be calculated using data collected 
in an unbiased manner. If systematic errors, such as poor 
randomization techniques, failure to blind outcome asses- 
sors, or the use of outcome measures that are not validated 
in the study patient population, are present, the point es- 
timate may lie at some distance from the true population 
value of the parameter.’ If this is the case, decreasing the 
level of confidence required or recruiting more patients 
cannot make up for poor study design. 


Key Concepts: Factors Affecting the Width of a 
Confidence Interval 

e Variability within the sample population 

e Sample size 

e The level of confidence required 


Advantages of the Confidence Interval 


Aside from providing us with information about the range 
of values in which a population statistic is likely to lie and 
the direction of the difference in the statistic of interest be- 
tween the groups studied, CIs can be used to answer two 
other questions: (1) Are the results of the study statistically 
significant? (2) Are the results of the study truly positive or 
negative? 
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When answering the question, “Are the results of the 
study statistically significant?” the CI is functioning as a 
hypothesis test. We can say that for a 95% CI we are 95% 
confident that the population statistic lies within the inter- 
val. We can also say that we are 5% confident that the true 
population value lies outside of the CI, 2.5% above our 
upper confidence limit, and 2.5% below the lower confi- 
dence limit. This 5% confidence level corresponds to a P va- 
lue of 0.05. Therefore, if the 95% CI does not include the va- 
lue of no effect, the results can be deemed statistically sig- 
nificant at the 5% level. Conversely, if the 95% CI does in- 
clude the value of no effect, the results are statistically in- 
significant. For study statistics that describe an absolute 
value - the mean, mean difference, or absolute risk reduc 
tion (ARR) - the value of no effect is 0, whereas for study 
statistics that employ proportions — the RR, RR reduction, 
and OR - the value of no effect is 1.' By examining the range 
of values of the CI, we are able to infer statistical signifi- 
cance in addition to knowing the largest and smallest clin- 
ical effects that are likely given the study data. 


Key Concepts: Inferring Statistical Significance from 
Confidence Intervals 

e For study statistics that present an absolute value (the 
mean, mean difference, and absolute risk reduction), 
the value of the null hypothesis is 0. If 0 is not included 
in the confidence interval, the results of the study are 
statistically significant. If 0 is included in the interval, 
the results are not statistically significant. 

For study statistics that utilize proportions (the rela- 
tive risk, relative risk reduction, and odds ratio), the 
value of the null hypothesis is 1. If 1 is not included 
in the confidence interval, the results of the study 
are statistically significant. If 1 is included in the inter- 
val, the results are not statistically significant. 


As mentioned earlier, the P value does not provide us with 
information on the magnitude of effect observed and 
therefore cannot be used to determine clinical significance 
or whether the results of the study are definitive. This is 
not the case with CIs, which can be used to answer ques- 
tions about clinical significance and to determine if the po- 
sitive or negative results of a study are real, or whether a 
larger study is needed to definitively rule in or out a statis- 
tical difference. 

When examining a positive study (i.e., a study that finds 
a difference between treatment groups) in which the re- 
sults are found to be statistically significant, we can deter- 
mine whether the study results are definitive and whether 
they are clinically significant by looking at the lower confi- 
dence limit, which is the smallest true difference between 
the groups that is compatible with the study data.’ If the 
lower confidence limit is above the minimum treatment 
effect we would require before changing clinical practice, 
then we can say that the results are both statistically and 
clinically significant. If, however, the minimum treatment 
effect we require falls within the CI - although the results 
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may still be statistically significant - we conclude that the 
study is not definitive and further investigation with a lar- 
ger sample size, which we know will narrow the width of 
the CI, is required. 

To interpret negative study results in which the treat- 
ment effect is found to be statistically insignificant, we 
turn to the upper confidence limit, the largest true differ- 
ence consistent with the study data, to determine whether 
the results of the study are definitive. If the upper limit of 
the CI falls below what we would consider a clinically im- 
portant treatment effect, then the results of the study are 
considered definitive and no further investigation is re- 
quired. If, however, the CI contains the minimally impor- 
tant treatment effect, the study has not ruled out the exis- 
tence of a clinically important treatment effect and a larger 
study is needed. 


Examples from the Literature: Example of the Use of a 
95% Confidence Interval 

Source: Hodgson SA, Mawson SJ and Stanley D. Rehabi- 
litation after two-part fractures of the neck of the hu- 
merus. J Bone Joint Surg. Br. 2003;85:419-422.° 


In this study, after a two-part fracture of the humerus a 
period of immobilization is recommended though the 
optimal length of this period is not defined. A rando- 
mized controlled trial conducted in 2003 enrolled 86 pa- 
tients and randomized them to receive either immediate 
physiotherapy within 1 week (group A) or delayed phy- 
siotherapy after 3 weeks of immobilization in a collar 
and cuff sling (group B). Sixteen-week results for the pri- 
mary outcome, the Constant Shoulder Score, subsection 
results from the outcome measure, and 95% CIs for the 
mean difference are presented in Table 21.1. 


By examining the P values in the right-hand column of 
the table, we see that those individuals starting immedi- 
ate physiotherapy after suffering a two-part fracture of 
the humerus had statistically significant higher results 
on the Constant Shoulder Score and on the pain subsec- 
tion of the questionnaire when compared with indivi- 
duals who had 3 weeks of immobilization. However, 
there was no statistically significant difference on the 
physical function or mental health subsets of the scale. 
When we apply the rules learned earlier for interpreting 
positive and negative results with the 95% Cl, and look at 
the lower confidence limits for the two positive compar- 
isons, we see that the CIs for the Constant Shoulder Score 
and pain do include results, which may be smaller than 
what one might consider a minimally important treat- 
ment difference. Therefore, these results could not be 
definitive and not clinically significant. Furthermore, if 
we examine the upper confidence limits for physical 
functioning and mental health, we see that the true po- 
pulation values for the mean differences could be as high 
as 10 points, which many would consider clinically sig- 
nificant; therefore, the study has failed to rule out a clini- 
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Table 21.1 Sixteen-Week Results for Primary Outcome after Two-Part Fractures of the Neck of the Humerus 











Outcome Group A (n=42) GroupB(n=40) Mean Difference P Value of Difference 
Constant shoulder score 0.70 0.54 0.16 (0.068-0.25) 0.001 

Physical functioning 69.9 69.2 0.65 (-10.15-11.45) 0.9 

Pain 72.0 59.9 12.2 (3.2-21.2) <0.01 

Mental health 74.0 71:2 2.8 (-5.7-11.3) 0.51 


Source: Data from Hodgson SA, Mawson S], Stanley D. Rehabilitation after two-part fractures of the neck of the humerus. | Bone Joint Surg 


Br 2003;85:419-422. 


| cally important treatment effect and further investiga- 
tions are necessary. 


Key Concepts: Interpreting Positive and Negative 

Studies with a Confidence Interval 

Positive studies 

e If the lower confidence limit is above what we would 
consider the minimally important treatment effect, 
the study shows clinical significance and the results 
are definitive. 


Putting It All Together 


Let us assess the results section of a meta-analysis done 
comparing arthroplasty and internal fixation in the treat- 
ment of displaced subcapsular hip fractures (Fig. 21.1).'° 
Fourteen studies are included in this meta-analysis; the 
data is represented in a Forest plot. The point estimates are 
given of treatment effects (that is the RR of reoperation 
comparing arthroplasty and internal fixation) as well as 
their 95% CIs. The center line of “1” indicates no difference 
between the groups. Thus, if any number within the CI in- 
cludes 1, then there is a chance the “true value” is 1 and 
there is no difference statistically between groups within 
each individual study. From this representation, it be- 
comes immediately apparent what the magnitude of treat- 
ment effect is, the direction of that effect, and if that effect 
is statistically significant. For instance, we note that all 
point estimates lie to the left of 1 and thus favor arthro- 
plasty. That is, there is a reduction in the risk of reoperation 
with arthroplasty as compared with internal fixation 
when treating a displaced femoral neck fracture. Having 
said this however, we note that for some of the studies, 


e If the lower confidence limit is below the minimally 
important treatment effect, the results are not defini- 
tive and a larger study is necessary. 


Negative studies 

e If the upper confidence limit is below the minimally 
important treatment effect, the results of the study 
are definitive. 

e If the upper confidence limit is above the minimally 
important difference, the study has failed to exclude 
a clinically significant treatment effect and a larger 
study is required.° 


the CIs include 1; therefore, they have not achieved statis- 
tical significance. 

Another interesting observation is that the studies with 
the highest number of patients have the narrowest Cls, il- 
lustrating the importance of sample size and the role it 
plays in increasing the precision of the results. A meta-ana- 
lysis statistically pools the data of individual trials to give 
effectively one large trial. We can see that the confidence 
limits of the pooled estimate reflect this. 

From this plot, we can also assess the clinical significance 
using the confidence limits. Let us say, for example, a 20% 
reduction in risk is clinically important, that is a 20% reduc 
tion in the risk of reoperation (this is arguable and only hy- 
pothetical). This corresponds to a RR of 0.8. We only need 
to look at the CIs to see if they include this number, if they 
do, the study may not be clinically significant. However, if 
the CIs fall below this number, then we can say the study is 
both clinically and statistically significant. This then illus- 
trates the importance of using Cls. 
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Favors Favors internal 
arthroplasty fixation 
— — > 
Davison 2001, n = 280 patients eI 
Ravicumar 2000, n = 271 patients 
Van Vugt 1993, n = 43 patients 
Parker 2000, n = 208 patients l 
Van Dortmont 2000, n = 60 patients 
Tidermark 1999, n = 39 patients 
Sikorski 1981, n = 218 patients -—— 
Soreide 1979, n= 104 patients 
Johansson 2000, n = 100 patients R---—______4, 
Neander 1997, n = 20 patients 
Jonsson 1996, n = 47 patients Fig. 21.1 The effect of arthro- 
ogra 2002, =A pate’ = rel eee 
Puclakka 2001, n = 32 patients (From Bhandari M, Devereaux PJ, 
Jansen 1984, n = 102 patients sy Swiontkowski MF, et al. Internal 
pooled Estimate fixation compared with arthro- 
n= 1933 patients = a Aa an 
0.001 0.01 0.1 1 10 J Bone Joint Surg Am; 2003; 
Relative Risk (95% confidence interval) 85:1673-1681. Reprinted by per- 
mission.) 





Conclusions 


CIs are useful tools that describe the level of precision in a 
statistical estimate by providing a range of values in which 
the true population value of an outcome is likely to lie. CIs 
can be calculated for a wide range of statistical parameters 
such as the mean, mean difference, RR, and NNT, and offer 
several advantages over conventional hypothesis testing 
and P values. CIs can be used to infer clinical significance 
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22 
The P Value Defined 


“Every science and every inquiry, and similarly every activity and pursuit, is thought to 


aim at some good.” 


Summary 


In this chapter, the definition and use of P values in hypoth- 
esis testing is presented. There is also a brief discussion on 
the limitations of using the P value. 


Introduction 


When carrying out a research study of an experimental 
clinical design, we most commonly have two groups of 
people each representing a sample of the population. 
One group gets treatment A and the other group either 
acts as a control (in drug trials study groups usually receive 
a placebo - this is difficult in a surgical trial) or receives 
treatment B (in orthopaedic trials this could be two differ- 
ent types of internal fixation, two different postoperative 
regimes, etc.). An outcome to study has been set at the start 
of the trial, for instance, radiographic measures (e.g., non- 
union), functional measures, reoperations, and so on, and 
we assess how the two groups do in comparison to each 
other. We can then create two different hypotheses: there 
is no difference between the groups (the null hypothesis) 
or there is a difference between the groups (the alternative 
hypothesis). The null hypothesis refers to the presumed 


P Values 


Hypothesis testing involves determining if the null hy- 
pothesis - no difference in the study parameter between 
the groups - is true. The P value is the probability of ob- 
served differences in the study groups being due to 
chance. 


Jargon Simplified: P Value 

The P value is the probability, under the assumption of 
no difference, of obtaining a result equal to or more ex- 
treme than what was actually observed.? 


By convention, the largest acceptable probability that the 
results we have obtained are due to chance is 0.05, that is 
to say a5% chance of finding a difference because of chance. 
This value, also known as the alpha (a) value, is the prob- 
ability of concluding that there is a difference between the 


— Aristotle 


default of nature’; for example, that an accused person is 
innocent, or that a nonsteroidal antiinflammatory drug 
(NSAID) offers no benefit in treating osteoarthritis. The al- 
ternative hypothesis is the opposite of the null hypothesis; 
for example, that an accused person is guilty or that there 
is a benefit of the NSAID in the treatment of osteoarthritis. 
Generally, the researcher is most interested in the alterna- 
tive hypothesis.' 


Jargon Simplified: Null Hypothesis and Alternative 
Hypothesis 

The null hypothesis assumes no effect or difference be- 
tween samples, whereas the alternative hypothesis sug- 
gests there is a difference or effect between samples. 
Usually, it is the alternative hypothesis that the re- 
searcher is interested in proving to be true. 


groups when in fact no difference exists. This mistake is 
known as type I error and will be discussed in more detail 
in Chapter 23. When our P value is <0.05 we say that this is 
statistically significant and reject the null hypothesis that 
there is no difference between the groups studied. That 
is to say, there is a less than a 5% chance that the differences 
observed between groups are due to chance alone. Conver- 
sely, results are said to be not statistically significant when 
the calculated P value is greater than 0.05 and we accept 
the null hypothesis.* This relates to the comfort level 
most people have in believing the results. Do we accept 
that 5% of the time we could be wrong; we see a difference, 
but the truth is there is not one? Depending on the out- 
come being studied, there could be significant implications 
to this. These will be discussed in more detail in Chapter 
24. 
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Examples from the Literature: Example of the Use of 
the P Value 

Controversy regarding the optimal treatment of fresh 
total Achilles tendon ruptures remains. To compare the 
results of anew modified percutaneous suturing techni- 
que with open repair for acute complete Achilles tendon 
ruptures, Cretnik et al? performed a prospective obser- 
vational study in which rates of complication, rerupture, 
and sural nerve disturbance were examined. One hun- 
dred and thirty-two patients underwent percutaneous 
repair and 105 had an open repair. Rates of complication, 
rerupture, and sural nerve disturbance are presented in 
Table 22.1. 

By examining the P values in the right column of the ta- 
ble, we see that only one number, the one for total com- 
plication rate (0.03), falls below our conventional cutoff 
for statistical significance of 0.05. The other values for 
rates of rerupture and sural nerve disturbance fall above 
the 0.05 cutoff. Therefore, we can conclude that there is a 
statistically significant increase in total complication 
rates for patients undergoing open repair of acute com- 
plete Achilles tendon rupture compared with percuta- 
neous repair (this P value says that there is a 3% chance 
the difference that we see is not a true difference. The 
difference in rates of rerupture and sural nerve distur- 
bance are not statistically significant at a level of 0.05. 


Limitations of Hypothesis Testing 


Although the use of hypothesis testing and the presenta- 
tion of a P value to determine statistical significance, or 
lack thereof, is standard in clinical research there are limits 
to the information P values provide and care must be taken 
when interpreting the meaning of a significant or non- 
significant finding on a hypothesis test. 


Table 22.1 Rates of Rerupture, Total Complications, and 
Sural Nerve Disturbance in Percutaneous Repair versus 
Open Repair of the Ruptured Achilles Tendon 








Outcome Percutaneous Open Repair P Value 
Repair (n = 105) 
(n = 132) 
Rerupture 3.7 2.8 0.680 
Total complications 4.5 12.4 0.030 
Sural nerve disturbance 4.5 2.8 0.487 


Source: Data from Cretnik A, Kosanovic M, Smrkolj V. Percuta- 
neous versus open repair of the ruptured Achilles tendon. Am J 
Sports Med 2005; 33(9):1369-1379. 
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First, there can be a tendency in the literature and 
among physicians to interpret statistically significant 
treatment effects as clinically important. If the sample 
size in a given study is large enough, even very small treat- 
ment effects may be found to be statistically significant; 
despite this, the benefit for each individual patient may 
not be large enough to change clinical practice and the re- 
sults of the study could be said to be clinically insignificant. 

A second common error that is made when interpreting 
P values is to conclude that, when the difference between 
two groups is found to be statistically insignificant, they 
are the same. Because the P value depends on the magni- 
tude of the effect as well as the sample size,’ important 
benefits of treatment may be missed if the number of pa- 
tients recruited into the study is small, which is often the 
case in orthopedic surgery. In such cases, rather than inter- 
preting a non-significant P value as evidence of a lack of 
difference between the groups, we should take it to repre- 
sent a lack of evidence that there is a difference.® 


Examples from the Literature: Example of the 
Limitations of the P Value 

The risks and benefits of early internal fixation of acute 
scaphoid fractures versus the standard of cast immobili- 
zation have not been established. Dias et al.° performed a 
randomized controlled trial in which 88 patients with an 
acute scaphoid fracture were randomized to treatment 
with cast immobilization or internal fixation with a Her- 
bert screw and no cast. Study findings for range of mo- 
tion and grip strength as a percentage of the opposite 
side and level of pain measured with a visual analogue 
scale at 52 weeks postinjury are presented in Table 22.2. 
Examination of the P values for comparison of the scores 
between the two groups reveals that grip strength was 
the only parameter for which a significant difference be- 
tween the treatment groups was found (i.e., it was the 
only comparison that resulted in a P value less than 
0.05). This difference can therefore be said to be statisti- 
cally significant. However, when we examine the mean 
scores for grip strength we find that internal fixation af- 
fords only a 7.5% increase in grip strength compared 
with cast immobilization. Achieving this level of im- 
provement, though statistically significant, may not be 
worth the increased risks associated with undergoing a 
surgical procedure; therefore, the results may be said 
to be clinically insignificant. Furthermore, although the 
differences in pain and range of motion between the 
two treatment groups were not found to be statistically 
significant, we cannot conclude that there is truly no dif- 
ference between the groups - only that this study failed 
to provide evidence for a difference. 
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Table 22.2 Study Findings for Range of Motion and Grip Strength as a Percentage of the Opposite Side and Level of Pain 
Measured with a Visual Analogue Scale at 52 Weeks Postinjury 





Outcome Cast Immobilization (n = 44) Internal Fixation (n = 44) P Value 
Range of motion 93.4 94.1 0.814 
Pain 1.83 2.18 0.551 
Grip strength 91.6 99.1 0.029 


Source: Data from Dias JJ, Wildin C}, Bhowal B, Thompson JR. Should acute scaphoid fractures be fixed? A randomized controlled trial. 


J Bone Joint Surg Am 2005;87:2160-2168. 


Conclusions 


Thus, we can see that the P value helps guide us in knowing 
whether or not the measurements obtained from each 
treatment group are statistically significantly different 
from each other or not. The value we set the P value at tells 
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Errors in Hypothesis Testing 


‘All men are liable to error...” 


Summary 


We have seen from preceding chapters that in clinical trials 
of therapy, commonly two groups are compared. A treat- 
ment is given to one group and the outcomes are then 
compared between the groups by way of hypothesis test- 


Introduction 


Scientists and clinicians alike will test hypotheses to see if 
they are true, helping to provide scientific basis to their 
work and practice principles. To test a hypothesis, a sample 
is selected representing the total population to which the 
results will be applied. Because the study will be applied 
to the sample and not every member of the population, 


Errors in Hypothesis Testing 


When referring to errors in scientific literature, one is not 
referring to mistakes in observation or technique, but 
rather to a deviation of individual observation of a sample 
group that is typical of the population.! In general, scien- 
tists recognize two types of error, systematic and statisti- 
cal.” Systematic error refers to the difference between the 
observed measurement and the “true” measurement due 
to nonrandom fluctuations from an unknown, but poten- 
tially controllable source.” On the other hand, a statistical 
error refers to the difference between the observed mea- 
surement and the true value caused by random and unpre- 
dictable fluctuations.” 


Jargon Simplified: Statistical Error 

Statistical error occurs when differences between sam- 
ples are due to chance and are not real. Statistical error 
does not refer to errors in research design or methodol- 


ogy. 


Statistical error is an essential consideration in the context 
of hypothesis testing. Recall from Chapter 22 that when 
testing a hypothesis, two terms are frequently used: the 
null hypothesis and the alternative hypothesis. 

To determine which hypothesis is true, an experiment is 
performed; the results lead the researcher to one of four 
conclusions. There are two possible correct conclusions: 


— John Locke 


ing. In this chapter, some of the errors that can be made 
in hypothesis testing and ways to recognize them in the 
literature are explored. 


there is always a chance that the conclusion will be incor- 
rect: the hypothesis will be concluded to be true when it is 
in fact false (“false negative” or concluded to be false when 
it is in fact true (“false positive”). One of the goals of testing 
a hypothesis is to minimize the chances of arriving at the 
wrong conclusion. 


(1) no benefit is discovered when in fact there is none, or 
(2) a benefit is discovered when a benefit does exist. In 
this situation, the clinical study results reflect what is 
true in nature. If, on the other hand, the results do not re- 
flect what is true in nature, there will have been one of two 
incorrect conclusions: (1) it is beneficial when it is not, or 
(2) it offers no benefit when it actually does. These errors 
are labeled type I or «a error, and type II or B error. 

Researchers Neyman and Pearson first identified these 
two sources of potential error in hypothesis testing in 
the mid-1900s, which they described as the error of reject- 
ing a hypothesis that should have been accepted and the 
error of accepting a hypothesis when it should have been 
rejected.? They further developed this concept in subse- 
quent studies, stressing the importance of designing tests 
so as to minimize the occurrence of such errors.” 


Types of Statistical Error 


As mentioned above, at least two main kinds of statistical 
error can occur in hypothesis testing. The first type, the al- 
pha error, is also known as type I error, error of the first 
kind, or a false positive error. This type of error occurs 
when the null hypothesis is rejected when in fact in nature 
it is true. That is, one concluded that the alternative hypo- 
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thesis was true when in fact it was not and the actual test 
results arose as a result of chance alone. For example, in a 
trial of glucosamine tablets in the treatment of osteoarthri- 
tis of the knee, a researcher would commence by establish- 
ing the null hypothesis that there is no benefit from gluco- 
samine, and the corresponding alternative hypothesis 
would be that there is a benefit of the pill in the treatment 
of that condition. If the researchers conclude that there is a 
benefit from their experiment, they would reject the null 
hypothesis in favor of the alternative and conclude that 
glucosamine is useful for the treatment of osteoarthritis. 
If this conclusion is, in fact, false, and the result observed 
is due to chance - there is actually no benefit from gluco- 
samine, the null hypothesis was incorrectly rejected, and 
a type I or o error has occurred. 

As a point of terminology, it is important not to confuse 
the terms a and a error. Alpha error is synonymous with 
type I error (as described above), whereas œ refers to 
the probability of committing a type I error by chance.! 
When properly used, the «a level will be set before the ex- 
periment is started so that the choice of what is a critical 
value of a is objective and not influenced by the data re- 
sults. Regardless of the particular scientific test being per- 
formed, it is important to minimize œ and thereby mini- 
mize the chance that a type I error will occur. 


Jargon Simplified: Alpha Error 

Alpha error, or type I error, occurs when the null hy- 
pothesis is concluded to be false when it is true. The re- 
searcher concludes there to be a difference or treatment 
effect when there is not. Alpha refers to the probability 
of committing such an error, and by convention is 
usually set at 0.05. 


By way of analogy, consider the legal system and the pre- 
sumption of innocence. Under that system, the prosecutor 
must prove the guilt of the accused and the most unpalata- 
ble error would occur if the conclusion of guilt was made 
when the accused was in fact innocent; that is a type I or 
a error. Because of the gravity of the potential error, re- 
searchers should set « at a low level and set up their scien- 
tific testing accordingly so as to minimize the chance of 
concluding there is a positive effect when there is none. 
Conventionally, the «a level of 0.05 has been widely used 
as an accepted level in medical literature; however, re- 
searchers can set other levels as long as they are justified 
in the particular circumstances of a given case.’ When set- 
ting a at 0.05, the researcher is accepting the fact there is a 
5 in 100 chance that they will incorrectly conclude there is 
a difference, or benefit, of the treatment in question when 
in fact there is not. Of course, this concept does not account 
for systematic errors and refers only to statistical error. 
The second type of statistical error that can occur is type 
II error also known as error of the second kind, f error, or 
false negative.' This type of error occurs when the null hy- 
pothesis is not rejected when it is false; there is a failure to 
accept the alternative hypothesis when it is true. Consider 


the previous example in atrial on osteoarthritis of the knee 
and treatment with glucosamine. If the researcher con- 
cludes that there is no benefit when in nature there is, 
then a type II or B error has occurred. 

Again, one must not confuse the terms f error and B. The 
probability of this error occurring by chance alone is de- 
noted by ß, which is the probability of concluding there 
is no difference when in fact there is, or a false negative. 
Like a, B is set prior to collecting data and is used to assess 
the power of an experiment. Power refers to the ability of a 
test to detect a difference when one actually exists and is 
denoted by the 1-f.° The power of a test is often used to de- 
termine the sample size required for an experiment. 


Jargon Simplified: Beta Error 

Beta error or type II error occurs when the researcher ac- 
cepts the null hypothesis when it is false; he or she con- 
cludes there to be no difference or treatment effect, 
when there is. Beta refers to the probability of commit- 
ting such an error and by convention is usually set at 
0.20. 


Consider the legal system: if the accused person is found 
innocent when he or she is actually guilty, a type II error 
has occurred. To offset this possibility, researchers will 
set the B prior to data collection. Although keeping the B 
low minimizes the probability of committing a type II er- 
ror, it has the corresponding impact of requiring an in- 
creased sample size for the experiment. Because a type II 
error is considered somewhat less problematic than a 
type I (releasing a guilty person versus convicting an inno- 
cent person), it is often set at higher levels than a. Beta is 
often set at 0.2 or 0.1, that is, there is a 20 in 100 chance, 
or a 10 in 100 chance to conclude no effect when one actu- 
ally exists. 

Table 23.1 illustrates the relationship among error types 
and their probabilities dependent on the conclusion of the 
experiment and the truth. 


Test Power 


When interpreting medical literature most emphasis is 
placed on the a error, and often little if any attention is 
paid to the error. This is not unfounded. The most impor- 
tant task for a researcher is to decide first what the accep- 
table risk is of rejecting the null hypothesis when it is true. 
That is, falsely concluding there is a difference and altering 
their clinical practice based on a false result that is due to 


Table 23.1 Summary of Alpha and Beta Errors 














Decision 
Null true Null false 
Truth Null true Correct Type | error 
Null false Type II error Correct 
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chance alone and not effect. This value must be set prior to 
the start of the data collection.° Of secondary concern is the 
acceptable value of committing a type II error, or ß error, 
that is concluding there is no difference and defaulting to 
nature when one actually exists. Again B should be set be- 
fore the study commences. It represents the probability of 
committing a type II error. Consequently, setting p at dif- 
ferent levels will determine the ability of the experiment 
or test to detect a difference. 

This introduces the concept of the power of a study: the 
ability of the study to detect a difference when one exists.' 
The power of a test is designated as 1-ß. That is the prob- 
ability that the researcher will reject the null hypothesis 
when it is false. If a researcher sets B at 0.2 then the power 
of the study would be 0.8 or 80%. This means that there is 
an 80% chance that the study will detect a true difference of 
previously set magnitude for their set sample, and conse- 
quently there will be a 20% chance of failing to detect a dif- 
ference when one exists. Again, the B error should be con- 
sidered before the study starts in the selection of the sam- 
ple size. The greater the sample size the more likely one is 
to detect a difference in the study if one exists, and less 
likely to commit a ß error. This is easy to do before starting 
the study. However, it may increase the sample size signif- 
icantly, thus it should be considered prior to initiating the 
study. 


Jargon Simplified: Power of a Study 

Power of a study refers to the ability of a study to detect a 
difference or treatment effect when one actually exists. 
It should be calculated prior to the start of the study as 
1-B. When is set at 0.20, there is an 80% chance that 
the study will detect a difference or treatment effect 
when one exists. 


Sample Size 


Sample size is a critical element of hypothesis testing as it 
bears on the possibility of error and the scientific value of 
the results that are generated. Sample size must be consid- 
ered before the study starts, or, one may find the results are 
useless and the study was a complete waste of resources. 
There are numerous factors to be considered when select- 
ing the sample size.' First, the outcome measure needs to 
be selected. Once selected, one must decide what is a clini- 
cally significant difference: what value will be deemed sig- 
nificant in clinical practice. For example, decreasing infec 
tions in total hip replacements from 0.5 to 0.4% is a very 


23 Errors in Hypothesis Testing 


small difference and would require huge numbers, and 
may not be clinically relevant. If, however, a new treatment 
for slipped femoral capital epiphysis (SCFE) would reduce 
the avascular necrosis rate from 50% to 10% the numbers 
would be smaller and obviously clinically relevant. In addi- 
tion to what is a clinically significant difference one must 
also set the a, B, the type of data and the variance of the 
data. In general, the variance may be known, or may not 
be known and may be estimated. Alpha is set at 0.05 and 
B at 0.20, thereby leaving the researcher to determine 
the size of detectable difference between groups. As in 
the SCFE example if this number is large the sample size re- 
quired will be smaller, and if it is small, the sample size 
needed will be much larger. 


Examples from the Literature: Example of the Effects 
of Sample Size 

Source: Canadian Orthopedic Association. Reamed ver- 
sus unreamed intramedullary nailing of the femur: com- 
parison of the rate of ARDS in multiple injured patients. J 
Orthop Trauma 2006; 20(6):384-387.° 

Abstract: The objective was to compare the rate of Acute 
Respiratory Distress Syndrome (ARDS) in multiply in- 
jured patients with femoral shaft fractures, treated 
with intramedullary femoral nails inserted with or 
without reaming. Three hundred fifteen patients with 
322 femoral shaft fractures were stratified into 2 groups 
according to their estimated injury severity scores 
(ISS>18 versus ISS<18) and then randomized to receive 
an IM nail with either reamed or unreamed insertion 
for primary stabilization of their femoral shaft fracture. 
One hundred forty seven patients with 151 fractures re- 
ceived an unreamed nail whereas 168 patients with 171 
fractures, received a reamed nail. All fractures were 
nailed within 24 hours after their trauma. Three of the 
63 multiply injured patients who received a reamed 
nail developed ARDS as compared with 2 out of 46 pa- 
tients in the unreamed group. This difference was not 
statistically significant (P = 0.42). (The power for this dif- 
ference is only 5%. 39,817 patients are needed in each 
group to detect a difference that small.) This difference 
was not statistically significant. There were a total of 4 
deaths, 2 each in both the reamed and unreamed group. 
No death resulted from ARDS. The authors concluded 
that the overall incidence of ARDS was found to be low 
with primary stabilization of femoral shaft fractures 
with intramedullary nailing. There was no difference 
in the incidence of ARDS between the reamed and un- 
reamed groups, given the sample size. 
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Putting It All Together 


Consider now a single example encompassing all of the 
principles articulated in this section. Suppose an orthope- 
dic surgeon would like to determine if computer-navigated 
total knee replacements (TKRs) provide increased range of 
motion versus standard nonnavigated total knee replace- 
ments. First, the surgeon must consider what an accepta- 
ble outcome measure would be. In this case, they could 
consider range of motion at one year, then they would 
set a difference that they considered clinically significant, 
for example, 10 degrees. If 100 degrees were the norm 
after a standard total knee replacement, the outcome mea- 
sure would be an increase of 10%. Alternatively, the sur- 
geon could also estimate the variance from the normal 
population. 

Next the surgeon must define an acceptable risk level for 
type I or a error. That is, an acceptable probability of con- 
cluding there is improvement in range of motion with na- 
vigation when in fact there is not. As explained, this is ty- 
pically set at 5% so the a equals 0.05. 

Finally, the surgeon would consider the power and B 
of the study. Here the surgeon must ask him- or herself 
what would be an acceptable risk level of a false positive 


Conclusions 


It is important to note that errors can be made when both 
accepting a hypothesis as well as rejecting it. This concept 
must always be at the forefront of our minds when at- 
tempting to translate clinical research into practice. We 
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conclusion; in other words, concluding there is no increase 
in range of motion with navigation, when in fact there is 
improvement. This is frequently set at 20% so the ß is 0.2 
with the corresponding power at 0.080 or 80%. This means 
there is an 80% chance of detecting an increase in range of 
motion with navigation, if one actually exists. Now the sur- 
geon should have enough information to use formulae to 
define the necessary sample size. 

Suppose the research is performed and the surgeon 
finds that TKRs done with navigation provide increased 
range of motion. Even assuming that the research metho- 
dology was impeccable, there remains some risk that the 
wrong conclusion has been reached and that, in nature, 
there is no difference between navigated total knee repla- 
cements and nonnavigated total knee replacements (i.e., 
an a error has occurred). However, that risk should be 
less then 5% because the a was set at 0.05. Conversely, if 
the surgeon concluded there was no difference in the range 
of motion between the two groups when in nature there 
was, then a type II or B error has occurred. The risk of a 
type II error is 20%. 


can lessen the impact of potential errors by recognizing 
their cause and critically appraising the literature before 
applying the results to patient care. 
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Clinical versus Statistical Significance 


“Errors are not in the art, but in the artificers.” 


Summary 


Although a study may have proven statistical relevance, it 
may not have clinical relevance. In this chapter, how to de- 


Introduction 


One of the main functions of clinical research is to provide 
results that will help surgeons care for patients. If a study 
shows a difference in treatment effects between two 
groups, it is important to know if this difference is statisti- 
cally significant. That is, did the treatment really have an 
effect on a group as compared with those who did not re- 


The Interpretation of Results 


In clinical studies of therapy, we are generally comparing 
and assessing two groups. These groups can be either a 
control group and an experimental group (one that has re- 
ceived the therapy being studied) or a comparison of treat- 
ment A versus treatment B, which is more common in the 
orthopaedic literature. In general, we are interested in the 
effects of one treatment over another in each of the groups. 
We have discussed in the preceding chapters how these re- 
sults can be presented and how we can determine if the 
outcomes seen between the groups are truly different 
from each other. When hypothesis testing, if it is con- 
cluded that there is a difference between groups, this in 
many instances is conveyed by a P value of <0.05. We 
know that this means that the groups are statistically sig- 
nificantly different with a 5% chance of us being incorrect 
in this conclusion. Given that this result is then considered 
statistically significant, what does this mean for the practi- 
cing clinician? As clinicians, we are interested in results 
that may affect our patients, that is, clinically significant re- 
sults. We shall see that clinically significant and statistically 
significant are not synonymous terms.! 

There are many different things that can affect how clini- 
cally relevant a study is to our patients, the most obvious 
are the overall study validity and the patient population 
being studied. For this discussion, we will assume the 
study is valid and the patient population is relevant. Given 
those two conditions, to determine if a result is clinically 
significant we must look at (1) how were the results pre- 


— Sir Isaac Newton 


termine the clinical significance of study results is dis- 
cussed. 


ceive the same treatment? From preceding chapters we 
learned that we can perform a hypothesis test to assess sta- 
tistical significance. Two groups being different statisti- 
cally, however, may not necessarily imply a clinically rele- 
vant difference. 


sented to us, and (2) if clinically relevant outcomes were 
used. 

It has been argued that a statistically significant result, 
that is, a difference in outcomes between two groups in a 
trial, as represented by a P value of <0.05 tells us nothing 
of the magnitude of the treatment effect nor does it tell 
us whether there is a clinically significant difference. It 
really only tells us that the two groups are indeed different 
in a statistical sense. Indeed, using only the P value may in 
fact unduly influence clinicians regarding their perception 
of the importance of study results. That is, if significant P 
values are presented some surgeons may feel the results of 
a study are more important than if they are not presented. 
As we have discussed in preceding chapters, using the 95% 
confidence interval (CI) tells us so much more. It helps us 
see the magnitude of a treatment effect, the direction of 
this treatment effect, helps to determine clinical signifi- 
cance and by way of its relationship to the P value whether 
there is a statistically significant difference as well. As dis- 
cussed in Chapter 21, for the CI to help us determine clini- 
cally significant differences we must understand what is 
clinically relevant to us and to the patients. This relates, 
in part, to the outcomes being assessed. If the outcomes 
are not actually clinically relevant then a statistically sig- 
nificant difference between treatments will have no mean- 
ing when applying the results to patient care. For example, 
if we hypothetically wish to study the effects of using two 
different plating techniques to treat a humeral shaft frac 


139 


140 


www.urdukutabkhanapk.blogspot.com 


llC Understanding Treatment Effects 


ture, arguably using operative time as an outcome is less 
important to patients then comparing reoperation rates 
or nonunion rates. Although not every study is designed 
to assess patient important outcomes (for example, biome- 
chanical or surgical technique studies), we need to be clear 
when applying results to patients that the outcomes used 
are actually applicable to them, that is clinically relevant. 


Key Concepts: Questions to Ask when Interpreting 

Results 

e Was the outcome a valid and reliable measure? 

e Was the outcome itself clinically relevant? 

e Was the outcome determined a priori? 

e Was the outcomes assessor blinded to treatment allo- 
cation? 

e Was a “final” outcome timepoint used or were multi- 
ple assessments at multiple timepoints done? 


Examples from the Literature: Determining Clinical 
versus Statistical Significance 

Let us discuss a recently published randomized con- 
trolled trial (RCT) to illustrate some of these principles.? 
This study compares open reduction internal fixation 
(ORIF) of Lisfranc joint injuries to primary arthrodesis 
of these injuries. It should be noted this critique is not 
about the study per se, but about what to look at when 
determining clinical versus statistical significance. 


The results of the study are listed in Table 24.1. If taken 
at face value, we can see that the functional outcome 
scores for the group treated with ORIF were significantly 
less than those treated with primary arthrodesis were. 
However, the P value tells us there is a 0.5% chance 
that we are wrong; in other words, there is a0.5% chance 
that this difference is seen as a result of chance. Never- 
theless, the obvious conclusion is that those treated 
with primary arthrodesis do better functionally at 2 
years compared with those treated with an ORIF. But is 
this really true? The problem is that we do not know if 
this difference in scores actually means anything clini- 
cally or not. 


To establish this, we need to know 

e If this a valid and reliable health outcome measure 

e If a 20-point difference in this scale truly reflects a 
clinically relevant functional difference 

e Who administered the health outcome measure 


Again, aside from the other methodological questions to 
ask, one can see that some initial questions need to be 
addressed before we can clearly say if there is a clinically 
important difference present between these groups. Ex- 
pressing these differences with CIs around the mean in- 
stead of only the P value would allow us to visualize the 
differences between groups and allow us to understand 
the clinical relevance of the result (as discussed in Chap- 
ter 21). To do this, we first assess what would be a clini- 
cally important treatment difference for our patients. 
For example, we may decide a 10-point difference in 
the American Orthopaedic Foot and Ankle Society (AO- 
FAS) score is clinically relevant (possibly from experi- 
ence or previous literature among other things). If we 
see that the CIs for the ORIF group still fall below what 
we determine to be a clinically significant treatment ef- 
fect, then the study would be clinically relevant for our 
patients. If they do not, then there is a chance that this 
study is not clinically relevant for our patients. We will 
illustrate this further below. 


So far, we have discussed the issue of using P values to as- 
sess Statistical significance and have made the argument 
that this may not correspond to clinically significant differ- 
ences. The corollary to this is that if a hypothesis test is 
done and using the P value we find no difference, does 
this also suggest that there is no clinical significance? It 
does not because it relates to the power of the study, which 
in turn, is related to its sample size. Recall from Chapter 23 
that if we have insufficient numbers of patients, we may 
find no statistically significant difference between groups. 
It may be that there truly is a difference and we are not able 
to measure it - this is a type II error. There is a possibility of 
being lulled into a false sense of security with the study re- 
sults when only the P value is reported. If we use the 95% 
CI, we can assess the upper and lower confidence limits. 


Table 24.1 Functional Outcome Scores for the Group Treated with ORIF versus the Group Treated with Primary 


Arthrodesis 
Group A: ORIF Group B: Primary Arthrodesis 
(n = 20) (n= 21) 


2 years postoperative: AOFAS score 68.6 


2 years postoperative: AOFAS score 88 


P Value 


Statistically significant difference, p < 0.005 





5 patients converted to arthrodesis 
fusion 


2 patients need another procedure to obtain 


No statistical test done 





Functioning at 62% of preinjury level 


(patient determined) (patient determined) 


Functioning at 92% of preinjury level 


Statistically significant difference, p < 0.005 


Abbreviations: AOFAS, American Orthopaedic Foot and Ankle Society. 
Source: Data from Ly, TV, Coetzee, JC. Treatment of primarily ligamentous Lisfrance joint injuries: primary arthrodesis compared with 
open reduction and internal fixation. | Bone Joint Surg Am 2006; 88(3):514—-520. 
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Even though there is no statistical difference seen if the 
range of the upper and lower confidence limits include 
what we would consider to be a clinically relevant differ- 
ence, then there is a chance that this difference truly exists. 
This is especially true when we are talking about rare or 
otherwise serious adverse events. One can see how impor- 
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tant this becomes if the adverse event is death and we 
commit a type II error. We would then be saying that there 
is no difference in death rates between groups when in 
fact, the study is not powered to assess this - in other 
words, we could be wrong. 


Confidence Intervals in Detecting Clinical Significance 


It was discussed in Chapter 21 how it is possible to use CIs 
to help determine clinical significance; it is worth revisit- 
ing briefly here. 


Examples from the Literature: Confidence Intervals in 
Detecting Clinical Significance 

Let us look at the results of a meta-analysis on operative 
fixation of Achilles tendon ruptures compared with non- 
operative management.’ 


In the article, the results were expressed in a Forest plot 
(Fig. 24.1a). There were six studies included in the meta- 
analysis and the results are expressed as point estimates 
with their corresponding 95% Cls. This again illustrates 
how we can immediately see the magnitude and direc- 
tion of the treatment effect for each study. To assess sta- 
tistical significance we need to see if any of the CIs in- 
clude “1.” If they do, then there is a chance that the 
true value is in fact “1,” or no difference and thus the re- 
sult is not statistically significant. To assess clinical sig- 
nificance, we must first determine what a clinically sig- 
nificant risk reduction of rerupture would be. Let us say 
that it is hypothetically 10%. That is if we can decrease 
the rerupture rate of Achilles tendons by 10% with an 
operation then we would call this clinically significant. 
This corresponds to a relative risk of 0.9. 


We need only then to look at the point estimates and 
confidence Cls (Fig. 24.1b). If they fall below this level, 
that is if the CIs do not include this number of 0.9, but 
fall less than this (or to the left), then we can say the re- 
sult is clinically significant for our practice. 


It is important to note that we need to assess the clinical 
significance of all outcomes as well as adverse events. 
This allows us to gain a full understanding of risks and ben- 
efits. For example, the pooled results from the preceding 
meta-analysis of operative versus nonoperative treatment 
of Achilles tendons show both a statistically decreased 
chance of rerupture and using our example, a clinically sig- 
nificant decrease in rerupture rates. However, we need to 
know what the overall risk of operating is. The wound in- 
fection rate was 4.7% in the above meta-analysis for the 
surgical group. From this example, for every 10 patients 
treated with surgery we prevent one rerupture; however, 
for every 16 patients treated, one will develop a wound in- 
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Fig. 24.1 A pooled analysis of rerupture rates (n = 448 patients) 
indicated a 68% reduction in the risk of rerupture with surgery 
when compared with conservative treatment. (a) Survey result - 
confidence levels. (b) Relative risk of rerupture. (From Bhandari M, 
Guyatt GH, Siddiqui F, et al. Treatment of acute Achilles tendon 
ruptures: a systematic overview and meta-analysis. Clin Orthop 
Relat Res 2002; 400:190-200. Reprinted by permission.) 
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fection. Is this risk acceptable, knowing that there is a pos- 
sibility of decreasing the rerupture rate with an operation? 
What is a clinically acceptable complication rate? What do 
our patients fear the most, rerupture or wound healing 


Conclusions 


We can see that a statistically significant result may, in fact, 
not be clinically significant and that several factors may af- 
fect this. We must assess not only the overall study quality 
and patient population, but the results that are being pre- 
sented to us as well as the clinical relevance of the particu- 
lar outcomes used. This depends on the many factors dis- 
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The Requirements of a Clinical Research Proposal 


“I know quite certainly that I have no special gift. Curiosity, obsession, and dogged endur- 
ance have brought me my ideas.” 


Summary 


In this chapter, guidelines are provided new investigators 
on the requirements of a good clinical research proposal. 
All well-designed studies are based upon a clear and com- 
prehensive proposal for the research that has gone through 
multiple revisions. In the preparation of a successful proto- 
col, one must ask clinically important questions, conduct 


Introduction 


A high-quality clinical study or randomized controlled trial 
often requires as much time in planning and preparation as 
it does in its execution. Although there are hundreds of 
steps required in the development of a study plan, this 
chapter focuses upon important considerations that 
should be addressed in the study protocol. All well-de- 
signed studies are based upon a clear and comprehensive 
proposal for the research.' When developing a study pro- 
tocol, input from all research team members is critical. It 
is also important to consult with experts in the surgical 
field, as well as with experts in health research methodol- 
ogy and clinical trials management. Revisions of the pro- 
posal are the rule not the exception. In large, collaborative 
trials, a proposal may undergo 20 to 30 revisions before it 


Formulating a Research Question 


The question formulation typically includes a description of 
the population, intervention, comparison groups, and 
study outcomes. The question will also help to determine 
the most appropriate study design. There is often a primary 
research question, then multiple secondary research ques- 
tions. The research questions are then converted into hy- 
potheses and then quantified into the primary and second- 
ary objectives. Frequently, a comprehensive literature re- 
view (described below) will identify key systematic reviews 
or meta-analyses that will assist investigators in planning 
their study. In some circumstances, the research question 
being posed may have to be rewritten to be more consistent 
with the current controversies in the research field.! The 
box below shows an example of a well-developed research 
question from a recently published surgical trial. 


— Albert Einstein 


systematic and comprehensive literature searches, select 
the appropriate study methodology and study design, de- 
termine the sample size using a sample size calculation, 
and select the appropriate research team members. These 
topics are detailed below. 


is accepted by all team members including the statisti- 
cians, methodologist, clinicians, and other research per- 
sonnel.! In the preparation of a successful protocol, the re- 
searcher should ask clinically important questions, con- 
duct systematic and comprehensive literature searches, 
select the appropriate study methodology and study de- 
sign, determine the sample size using a sample size calcu- 
lation, and choose the appropriate research team mem- 
bers. Each of these items is discussed in this chapter and 
a summary of the protocol format required by the Cana- 
dian Institutes of Health Research (CIHR) for all protocols 
for investigator-initiated randomized controlled trials is 
provided (Fig. 25.1). 








Ask a clinically important research question 








Conduct systematic and comprehensive literature searches 


f 


Select the appropriate study methodology and study design 














Determine the sample size using a sample size calculation 














Select the appropriate research team members 











Fig. 25.1 Guidelines for preparing a successful protocol. 
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Examples from the Literature: Research Question 
Source: Gimbel JS, Walker D, Ma T, Ahdieh H. Efficacy 
and safety of oxymorphone immediate release for the 
treatment of mild to moderate pain after ambulatory 
orthopedic surgery: results of a randomized, double- 
blind, placebo-controlled trial. Arch Phys Med Rehabil 
2005;86:2284-2289. 


Research Question: What is the analgesic efficacy and 
safety of 5 mg of oxymorphone immediate release (IR) 
versus placebo for mild to moderate pain (Sum of pain 
intensity difference [SPID] from baseline to 8 hours) in 
outpatients (age > 18 years) undergoing knee arthro- 
scopy? 


Conducting a Comprehensive Literature Search 


Prior to committing large amounts of time, personnel, and 
funding to a project, investigators must ensure that their 
proposed study is novel and advances the current under- 
standing of a problem.! A careful and systematic review 
of the available literature can inform investigators about 
the current evidence to date.’ A well-conducted systematic 
review or meta-analysis is invaluable because it is unusual 
for single studies to provide definitive answers to clinical 
questions.°*” Moreover, a well-conducted quantitative re- 
view may resolve discrepancies between studies with con- 
flicting results.“ Please refer to Chapter 27, which provides 
details on how to conduct a comprehensive literature 
search. 


Jargon Simplified: Systematic Review and 
Meta-Analysis 

“We use the term systematic review for any summary of 
the medical literature that attempts to address a focused 
clinical question, and meta-analysis as a term for sys- 
tematic reviews that use quantitative methods (i.e., sta- 
tistical techniques) to summarize the results.”! 


Because conducting a comprehensive review of the litera- 
ture comes at the cost of time, effort, and other priorities, 
surgeons can also seek information from sources that ex- 
plicitly publish evidence summaries, critically appraised 
topics, and systematic reviews (Table 25.1). 


Table 25.1 Potential Information Resources 


The Cochrane Library (http://www3.interscience.wiley.com/ 
cgi-bin/mrwhome/106568753/HOME) 
Bandolier (http://www.medicine.ox.ac.uk/bandolier/) 


University of York/NHS Center for Reviews and Dissemination 
(http://www.york.ac.uk/inst/crd/) 

Medline (www.pubmed.gov) 

Ovid (www.ovid.com) 

HIRU (http://hiru.mcmaster.ca/hiru/) 

Evidence Based Medicine Center at Oxford 
(http://www.cebm.net/) 

ACP Journal Club (http://www.acpjs.org/) 


Source: From Bhandari M, Schemitsch EH. Planning a randomized 
trial: an overview. Tech Orthop 2004; 19:66-71. Reprinted by 
permission. 


The information obtained from the comprehensive lit- 
erature search is used to write the background and intro- 
duction sections, providing a summary of any relevant pre- 
vious trials including their methodology, results, and lim- 
itations. The literature summary should provide a solid 
justification for conducting the trial. It is also important 
to ensure that the research question has not already been 
adequately addressed in previous trials. 


Selecting the Appropriate Methodology and Study Design 


The research question helps to inform the study design." If 
the research question compares results between two sur- 
gical treatments, a randomized controlled trial is the best 
study design to use. Other study designs to consider are a 
prospective cohort study, a case-control study, and a clin- 
ical case series. Each of these study designs are discussed in 
previous chapters of this book. 

The protocol must be explicit about the criteria for in- 
cluding patients in the trial. A large and comprehensive 
list of eligibility criteria will limit the generalizability (ex- 
ternal validity) of the study results beyond the specific 
group of included patients. Thus, one strategy to improve 
the external validity of a randomized controlled trial is to 


be inclusive in enrolling a diverse group of patients.' Alter- 
natively, in a cohort study where the risk of imbalances be- 
tween patient groups is high, having a comprehensive list 
of inclusion and exclusion criteria may improve the degree 
to which patient groups are similar.! 


Jargon Simplified: Inclusion and Exclusion Criteria 
“Study investigators specify the inclusion criteria to 
define the population who will be eligible for the trial.” 
Exclusion criteria are “criteria that render potential sub- 
jects ineligible to participate in a particular trial.”° 
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The box below provides a list of inclusion and exclusion 
criteria from a recently published trial comparing two 
alternate surgical techniques. 


Examples from the Literature: Inclusion and Exclusion 
Criteria for a Recently Published Trial 

Source: Smith WR, Ziran B, Agudelo JF, Morgan SJ, Lahti 
Z, Vanderheiden T, Williams A. Expandable intramedul- 
lary nailing for tibial and femoral fractures: An analysis 
of perioperative complications. J Orthop Trauma. 
2006;20:310-16.8 

Inclusion Criteria: Patients with acute OTA types A2, A2, 
A3, B1, and B2 diaphyseal fractures of the tibia or femur 
who were candidates for intramedullary fixation, older 
than 18 years, and treated by a designated study sur- 
geons 

Exclusion Criteria: Metaphyseal fractures, diaphyseal 
fractures with metaphyseal extensions, OTA types B3 
and C fractures, Gustilo grade IIIB and IIC fractures, ske- 
letal immaturity, and pathologic fractures. 


It is also important to determine the primary and second- 
ary outcome measures, and provide a justification for se- 
lecting them. The research question(s) should also inform 
the primary and secondary outcomes measures. The pri- 
mary outcome parameter is the one that the investigators 
consider to be the most important (e.g., secondary surgical 
procedures).° Subsequently, any other measures are desig- 
nated to be secondary outcome parameters (e.g., infection 
rates). The initial distinction between primary and second- 
ary outcome measures is important because the amount of 
outcome parameters impacts the threshold significance le- 
vel that needs to be used to determine if the result is signif- 
icant or not? 

Outcomes can be classified as continuous or dichoto- 
mous. Examples of continuous outcome measures include 
length of hospital stay, amount of blood loss, and func 
tional scores. Mortality rates, infection rates, and reopera- 
tion rates are examples of dichotomous outcomes. 


Jargon Simplified: Continuous and Dichotomous 
Outcome Measures 

Continuous outcome measures are “variables that can 
theoretically take any value and in practice can take a 
large number of values with small differences between 
them.” Dichotomous outcomes “are ‘yes’ or ‘no’ out- 
comes that either happen or do not happen.”° 


It is important to consider using health-related quality of 
life as an outcome measure in a Clinical trial. Motivation 
for measuring health-related quality of life should align it- 
self with a trial’s research questions. Therefore, the re- 
search question and the study objectives should be clearly 
stated and the investigator should have a comprehensive 
understanding of the disease or injury, expected benefits 
or harm of treatment alternatives, and how these factors 
may influence a patient’s health-related quality of life 
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within the multidimensional construct of health (physical, 
psychological, and social affects).!° When choosing a 
health-related quality of life measure for a surgical trial, 
the investigator must have a clear idea of the intended 
role of the measure when it was developed. Conducting 
an additional literature search and consulting with a clin- 
ical trial methodologist may help to inform the investiga- 
tor on different health-related quality of life measures. 


Jargon Simplified: Health-Related Quality of 

Life Measures 

Health-related quality of life measures are “measure- 
ments of how people are feeling or the value they place 
on their health state.”° 


One of the choices that investigators face when trying to 
identify an appropriate health-related quality of life mea- 
sure is whether to use generic or disease-specific instru- 
ments to measure health status. A generic instrument is 
one that measures general health status inclusive of physi- 
cal symptoms, function, and emotional dimensions of 
health. An example of a generic health-related quality of 
life measure is the Short Form-36 (SF-36).!! Because of 
their broad scope, generic instruments are useful for com- 
paring health status across different diseases, severities, 
interventions, and in some cases, across different cul- 
tures.! A disadvantage of generic instruments however, 
is that they may not be sensitive enough to be able to de- 
tect small, but important changes.!° 

Disease-specific measures are tailored to inquire about 
the specific physical, mental, and social aspects of health 
affected by the disease in question allowing them to detect 
small important changes.'° Because disease-specific in- 
struments are so focused, they do not allow for compari- 
sons between studies of different diseases, and sometimes 
do not even allow for comparisons between different po- 
pulations within the same disease.'° An example of a dis- 
ease-specific health-related quality of life measure is the 
Disabilities of the Arm, Shoulder and Hand (DASH) Out- 
come Measure.” Therefore, to provide the most compre- 
hensive evaluation of treatment effects, no matter the dis- 
ease or intervention, investigators often include both a dis- 
ease-specific and generic health measure.!° In fact, many 
granting agencies and ethics boards insist that a generic in- 
strument be included in the design of proposed clinical 
studies.!° 

Utility or preference measures are a unique form of gen- 
eric instruments that measure health status by quantifying 
wellness on a continuum anchored by death and optimum 
health." Through placement on a continuum with anchors 
of death and full health, preference measurement provides 
a means to compare alternative interventions, patient po- 
pulations, and diseases. It is particularly useful when at- 
tempting to measure the cost-effectiveness of competing 
interventions in which the cost of an intervention is re- 
lated to the number of quality-adjusted life-years (QALYs) 
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gained.'° Examples of utility measures include the Health 
Utilities Index Mark II/III and the EuroQol-5D.'?:4 


Key Concepts: Including Health-Related Quality as a 
Study Outcome 

“To provide the most comprehensive evaluation of treat- 
ment effects, no matter the disease or intervention, in- 
vestigators often include both a disease-specific and 
generic health measure. In fact, many granting agencies 
and research ethics boards insist that a generic instru- 
ment be included in the design of proposed clinical stu- 
dies.”!° 


There are several techniques available to administer 
health-related quality of life instruments including perso- 
nal interviews, direct mail, by telephone, and patient self- 
administration. In addition, in cases where the patient is 
too sick to complete questionnaires, a proxy or stand-in 
may answer questions on behalf of the patient.'° Table 
25.2 summarizes the strengths and weaknesses of the dif- 
ferent modes of administration.!° The choice of which 
method to use will depend largely on the research ques- 
tion, characteristics of the instrument, its items and re- 
sponse options, attributes of the patient population, and 
feasibility issues associated with cost and patient burden.!° 


Chapters 16 through 19 of this book provide additional 
information on selecting outcome measurements. 


Examples from the Literature: Examples of Outcome 
Measures in a Recently Published Surgical Trial 
Source: Kerkhoffs GM, Struijs PA, de Wit C, Rahlfs VW, 
Zwipp H, van Dijk CN. A double blind, randomized, par- 
allel group study on the efficacy and safety of treating 
acute lateral ankle sprain with oral hydrolytic enzymes. 
Br J Sports Med. 2004;38:431-435.'° 

Primary Outcomes 

Pain step test: The pain on walking one or two steps was 
assessed by using a visual analogue scale with a range of 
0 to 100 mm. 

Ankle volume: This was tested with a volometer (Boes] 
Medizintechnik, Aachen, Germany). This is a device to 
optoelectrically measure the circumference and volume 
of the foot. Only the volume of the ankle was measured, 
and measurements were expressed in plain numbers. 
Range of motion: This was measured with a goniometer 
by the neutral zero method. Both plantar flexion and 
dorsal flexion of the ankle were measured and summed 
for total range of motion. 

Secondary Outcomes 


Table 25.2 Modes of Health-Related Quality of Life Administration 











Mode of Advantages Disadvantages 
Administration 
Interviewer Maximal response rate Costly 
Can clarify questions Interviewer bias 
Higher completion rate Reporting bias 
Control over who is the respondent Characteristics of the interviewer (voice inflections, 
Control over the order of questions age, race, gender) may introduce bias 
Telephone Greater response rate than direct mail Excludes those without access to a telephone 
Relatively inexpensive Voice inflections of the interviewer may introduce bias 
Relatively quick data collection 
Interviewer can probe for incomplete answers 
Data collector can get clarification for ambiguous answers 
Direct mail Relatively inexpensive Response rates generally low 
No bias introduced through the interviewer Possibility of bias due to nonresponse 
May reach more respondents No control over who is the respondent 
Respondents can take time to locate certain information May misunderstand the question 
May miss questions (incomplete) 
Questionnaire may be lost in the mail 
Excludes illiterate, less-educated, handicapped, and 
non-English-speaking populations 
Self Maximal response rate May misunderstand the question 
Inexpensive May miss questions (incomplete) 
Proxy Can collect information on patients who otherwise are not Response may differ from target 


represented 


Source: Adapted from Jackowski D, Guyatt G. A guide to health measurement. Clin Orthop Relat Res 2003;413:80-89. Reprinted by 


permission. 
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The only secondary criterion was “global judgment of 
efficacy” assessed by the investigator. 

Safety Criteria 

Adverse events: The investigator recorded adverse 
events based on his own impression and observation. 


Determining the Sample Size 


Developing a protocol for a clinical trial and calculating the 
appropriate sample size can be a difficult task for many in- 
vestigators. The choice of the outcome measure has impor- 
tant implications on the required sample size.! 

The statistical power of a study is the probability that it 
will find a difference between two treatments when one 
actually exists. By convention, investigators set the accep- 
table study power to 80% (i.e., 20% chance of false negative 
results).! Small studies are at risk of being underpowered 
(study power <80%). 


Jargon Simplified: Statistical Power 

“Statistical power is a measure of how likely the study is 
to produce a statistically significant result for a differ- 
ence between groups of a given magnitude (i.e., the abil- 
ity to detect a true difference).”'® 


Power (1-f) is simply the complement of the type II error 
(B). Thus, if we accept a 20% chance of an incorrect study 
conclusion (B = 0.20), we are also accepting that we will 
come to the correct conclusion 80% of the time.! It is impor- 
tant to note that study power can be used before the start 
of a clinical trial to assist with sample size determination, 
or following the completion of a study to determine if 
the negative findings were true or due to chance.'” 


Jargon Simplified: Beta Error (Type II Error) 

Beta error or type II error occurs when the researcher ac 
cepts the null hypothesis when it is false; he or she con- 
cludes there to be no difference or treatment effect, when 
there is. Beta refers to the probability of committing such 
an error and by convention is usually set at 0.20. 


A sample size calculation before the conduct of a trial can 
help to estimate an appropriate sample size and to offset 
false negative (B error) results. Before calculating the sam- 
ple size, the investigator should clearly state primary and 
secondary outcome parameters (as described above). 

A significance level of P = 0.05 is used by convention for 
the main outcome parameter. That means that a chance of 
5% is being accepted to conclude that there is a significant 
difference between two groups, when in fact there is none 
(type I error or a error).? For any additional secondary out- 
come parameters adjustments of the significance level 
need to be made depending on the number of analyzed 
parameters. 
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Global judgment of tolerability: This was assessed by 
both patient and investigator. 


Jargon Simplified: Alpha Error (Type | Error) 

Alpha error or type I error occurs when the null hypoth- 
esis is concluded to be false when it is true. The re- 
searcher concludes there to be a difference or treatment 
effect when there is not. Alpha refers to the probability 
of committing such an error, and by convention is 
usually set at 0.05. 


The magnitude of the difference in the primary outcome 
parameter that the investigators consider clinically rele- 
vant should be the basis for the sample size calculation, 
for example, the difference in secondary surgical proce- 
dure rates when comparing two different treatment meth- 
ods. Alternatively, this difference can be simply hypothe- 
sized.° The sample size calculation will reveal how many 
patients per group are necessary to show if that difference 
truly exists or not. In addition to the hypothesized differ- 
ence in the primary outcome parameter and the signifi- 
cance level (usually a = 0.05), the acceptable power of 
the study and the anticipated standard deviations of the 
primary outcome parameter in the two groups need to 
be established before proceeding with the sample size cal- 
culation.° A study power of 0.8 is a conventionally accepted 
standard; this means that the investigators are willing to 
accept a 20% probability that there is no difference 
between two groups when a difference actually exists 
(B error)? Any increase in study power and decrease of 
the a level of significance will result in a higher sample 
size requirement. The anticipated standard deviations in 
the two groups can be determined either by performing 
a preliminary pilot study or from data in the literature.® 

Sample sizes can be calculated by hand or a computer 
program can be used. Numerous software packages exist 
for sample size determination. If no software is available, 
formulas can be used to determine sample size. It is best 
to consult with a biostatistician to be sure that the sample 
size is calculated correctly. 


Key Concepts: Sample Size Calculations 
It is best to consult with a biostatistician to be sure that 
the sample size is calculated correctly. 


Even at best, a sample size calculation is based upon the 
best available “guesstimate” of treatment difference be- 
tween treatment groups. To improve the reliability of an 
a priori sample size calculation, investigators can conduct 
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a pilot study of 20 to 50 patients to gain an estimate of the 
treatment effect in their proposed study population. Addi- 


Selecting Research Team Members 


Once the basic research question, study design, and esti- 
mated sample size have been determined, the next step 
is to select and organize the research team. Arguably, de- 
veloping a team may be considered the first priority before 
moving forward on question development.’ However, 
thinking through the available literature, refining the 
study question, and determining a sample size for a study 
will have considerable impact on the size of the team and 
the level of expertise required.! 

A successful research project is supported by a moti- 
vated, cooperative, and competent research team; the op- 
timal team is greater than the sum of its parts.! Each team 
member should provide specific skills to ensure all compe- 
tencies are maintained. Typically, a team for a large study 
should have clinical experts (surgeons), biostatisticians, 
health research methodologists, epidemiologists, health 
economists, data managers, research coordinators, and ad- 
ministrative personnel. The role of each research team 
member is described in Chapter 35 and the organization 
of a methods center or a project office is discussed in Chap- 
ter 36. 


Key Concepts: Study Plan Timeframe 

“The conduct of a well-designed study requires a large 
time commitment. Often, in a large study, the planning 
phase will almost require as much time as the conduct 
phase. As a rule of thumb, one can assume that a study 
that will take one year to conduct, will likely take one 
year to plan (literature search, protocol development 
and revision, obtaining funding). Attention and detail 
to the study plan will limit problems during the conduct 
of the study.”! 


Organizing a Trial Protocol 


An example of how to format a protocol for a randomized 
controlled trial, as required by the Canadian Institutes of 
Health Research (CIHR) for all protocols (from all over the 
world) for investigator-initiated randomized controlled 
trials (Table 25.3), is provided in this section.” This format 
provides a detailed, comprehensive protocol, which is re- 
quired for a trial to be successful. Writing an inclusive, 
complete protocol is necessary for the execution of a suc 
cessful clinical trial. 


tional information on how to calculate sample sizes in sur- 
gical trials is provided in Chapter 30. 


Table 25.3 Example of a Format for a Good Research 
Proposal 


The Need for a Trial 

Has the importance of the issue been adequately explained in 
terms of present and future resource implications for Canadian 
health care and the economy in general? 

Are the hypotheses to be tested and/or the study objectives spe- 
cified and described clearly? 


Is the trial addressing the right question(s)? 


Are the reasons for the study and the changes that might be im- 
plemented as a result of the study adequately explained? 


What evidence is available to inform the need for and design of 
this trial (e.g., systematic reviews, professional and consumer 
consensus, pilot studies)? 


Is this the right time to conduct the trial with respect to current 
knowledge of the intervention and current use of existing tech- 
nologies? 

Is the proposed research compatible with the extent of the avail- 
able knowledge, nationally and internationally? 


What impact will the results have on the health of the population, 
or our understanding of the proposed intervention or underlying 
condition? 

Will the results of the trial be generalizable beyond the immediate 
research setting of the trial in a way that will maximize the impact 
of the results? 


The Proposed Trial 

The protocol must clearly and adequately answer the following 
questions: 

Is the study design appropriate to answer the research questions 
posed? 

Has sufficient account been taken within the study design of the 
issues of generalizability and representativeness? 

What is the justification for the hypothesis underlying the power 
calculations? 

Are the outcomes and their measures clearly described and ap- 
propriate to the scientific hypothesis? 


Has the trial population been defined adequately in relation to the 
target population so that the results will have meaning? 


Have the measures been validated specifically for the target po- 
pulation(s)? 

Is the control group appropriate? 

How will potential sources of bias be avoided? 


Details of the Trial Team 
The protocol needs to address the following questions about the 
trial team: 


Does the team of investigators proposed have the necessary 
range of disciplines and experience necessary to carry out the 
study? It has been noted that full applications fail often because 
the study team does not have the appropriate range and depth of 
content, methodological and biostatistical expertise. Experienced 
trialists and biostatisticians should be fully integrated into the re- 
search team. 


Has adequate statistical advice been sought and incorporated? 


Has adequate advice been sought and incorporated on other 
health services research issues if they are to be addressed? 


Source: Data from Canadian Institute of Health Research (CIHR). 
Home page. Available at: http://www.cihr.ca. Accessed July 1, 2006. 
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The protocol should begin with describing the problem 
that the investigators are planning to address in their clin- 
ical trial. The primary research question should also be in 
this section. A strong justification as to why the trial is 
needed now, using evidence from the literature, profes- 
sional and consumer consensus, and pilot studies should 
be cited if available. References to any relevant systematic 
reviews should be provided and the need for the trial in the 
light of these reviews should be discussed. If you believe 
that no relevant previous trials have been done, provide 
details of the search strategy used for existing trials. A de- 
scription of how the results of this trial will be used should 
be provided. 


The Proposed Trial 


This section should begin by detailing the proposed trial 
design, including whether the trial is open-label, double 
or single blinded, etc. The planned trial interventions for 
both the experimental and control groups and the pro- 
posed practical arrangements for allocating participants 
to trial groups are discussed. For example, if the randomi- 
zation method or if stratification or minimization are to be 
used, provide justification for using either methodology. 
Factors that will be stratified or minimized should be 
listed. The proposed methods for protecting against other 
sources of bias include blinding or masking. If blinding is 
not possible, explain why and give details of alternative 
methods proposed or implications for interpretation of 
the trials results. The planned inclusion and exclusion cri- 
teria should be listed with justification as appropriate. 

There should be a justification for the proposed duration 
of treatment period and the proposed frequency and dura- 
tion of follow-up. The rationale for the proposed primary 
and secondary outcome measures and how the outcome 
measures will be measured at follow-up should be pro- 
vided. Although it is not always appropriate to include 
health-related quality of life measures as an outcome in 
all trials; it is necessary to justify why these measures 
are to be either included or excluded. Some funding agen- 
cies require that any proposed lower and upper age limits 
for trial participants should be justified on scientific 
grounds.” Normally, for example, there should be no upper 
age limit on recruitment. Similarly, exclusion on the 
grounds of gender or sex should be justifiable on scientific 
grounds. Funding agencies may encourage the involve- 
ment of consumers and patient advocate groups, with 
the aim of better trial design and greater acceptability of 
both the trial and its findings. 

For the proposed sample size, include both the sample 
size for the control and treatment groups, a brief descrip- 
tion of the power calculations detailing the outcome mea- 
sures on which these have been based, and provide event 
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rates, means, and medians, and so on, as appropriate. Pro- 
vide a justification for the size of difference that the trial is 
powered to detect and mention whether the sample size 
calculation takes into account the anticipated rates of non- 
compliance and loss to follow-up. 

A description of the planned recruitment rate, how re- 
cruitment will be organized, over what time period re- 
cruitment will take place, and what evidence there is 
that the planned recruitment rate is achievable should fol- 
low. There should be a discussion on any likely problems 
with compliance and loss to follow-up; the evidence that 
the compliance and loss to follow-up figures are based 
on should also be included. 

The next section of the protocol describes the details of 
the planned analyses, including any planned subgroup 
analyses and the proposed frequency of analyses (includ- 
ing any interim analyses). 

A section on economic issues is not a requirement for all 
trials; however, it is important to justify the inclusion or 
exclusion of any health economic studies and give details 
of any economic study proposed. If an economic analysis 
is to be included in the protocol, the investigator should 
contact a health economist, usually an individual with a 
doctorate in health economics to be a co-investigator. 
The health economist will provide the expertise to suc 
cessfully design and complete an economic analysis. 

The final section on the proposed trial should provide an 
accurate budget, budget justification, and the length of the 
trial. It is important to provide a detailed timeline and bud- 
get justification, to determine if the trial is feasible before it 
is initiated. If the patient recruitment phase is going to take 
an unreasonable amount of time, then the investigator 
should consider adding additional clinical centers or refin- 
ing the inclusion and exclusion criteria previously identi- 
fied. The staging of trials is encouraged to provide a realis- 
tic timetable for the completion of the study. In addition to 
the need for a sound basis for the projected recruitment 
rate, adequate provision should be made for setting up 
and staffing the trials office, obtaining ethics approval for 
all participating centers, a start-up phase, etc. Please refer 
to Chapter 31 for a guide on how much a study costs and 
how to make an accurate, detailed budget. 


Details of the Study Team 


The last section of the investigator-initiated protocol de- 
scribes the study team. This section explains the overall 
trial management, including the role of each applicant pro- 
posed, the Steering Committee, the Methods Center, the 
Central Outcomes Adjudication Committee, and whether 
a Data Safety and Monitoring Committee will be estab- 
lished and its composition. The Principal Investigator 
should identify and confirm the membership of each of 
the above committees prior to beginning the trial. 

All committee members should have the opportunity to 
provide input into the study protocol. All co-investigators 
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and collaborators including a senior biostatistician should 
be located. It is also extremely important to identify the 
key research personnel involved in the study including a 
study coordinator, a data manager, a data analyst or statis- 
tician, a clinical research coordinator, and research and ad- 
ministrative assistants to ensure that the resources will be 


Conclusions 


The basis of each research protocol is a clinically important 
and novel research question. A careful overview of the lit- 
erature should provide the rationale for asking this ques- 
tion. Depending on the nature of the research question, 
an appropriate study design must be chosen. The primary 
and secondary outcomes measures should be stated in the 
protocol and should preferably comprise a generic health- 
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The Identification of a Good Research Question 


“You can tell whether a man is clever by his answers. You can tell whether a man is wise by 


his questions.” 


Summary 


The goal of this chapter is to guide surgeons through the 
process of creating and refining a good research question. 


Introduction 


In debating treatment options with a colleague or reconsi- 
dering your management ofa patient, you experience first- 
hand that the practice of medicine and surgery is rife with 
uncertainty. The goal of clinical research is to find answers 
to that which is unknown. The key to a successful research 
initiative is finding an important question that can be ad- 
dressed with a feasible and valid study. 

There are two types of research questions: descriptive 
and analytic questions. Descriptive questions are an- 
swered by observational studies (i.e., “What is the preva- 
lence of wound infection in diabetic patients after hip re- 
placement surgery?”). Analytic questions are answered 
by experimental studies; they typically address the effect 
of an intervention (i.e., “What is the impact of physiother- 
apy following hip surgery on 90-day mobility in diabetic 


Generating Research Questions 
The Proper Attitude 


The most important quality for any potential researcher to 
possess is the proper “research attitude.” This arises from 
the ability to keep an open mind, and to question and cri- 
tically evaluate current clinical practices. At the same time, 
one should remain receptive to the new ideas and experi- 
ences of other researchers. By striving to maintain an out- 
look of “enlightened skepticism,” a researcher should find 
no shortage of questions that need answering.! 


Sources of Ideas 


A clinical practice is the perfect breeding ground for re- 
search ideas. Patients are an excellent source of research 
questions. Uncertainties you may experience in the man- 
agement of their care can provide the spark for further re- 


— Naguib Mahfouz 


patients?”). Initial research into a particular topic usually 
begins with observational studies, which serve to provide 
information regarding the distributions of diseases and 
characteristics of the population. These are followed by ex- 
perimental studies, which attempt to show cause-and-ef- 
fect relationships between exposures and outcomes. 


Jargon Simplified: Experimental and Observational 

Studies 

e Experimental studies: Exposure to treatment is under 
the control of the investigator. This allows for the as- 
signment of treatments to be randomized. 

e Observational studies: Exposure to treatment is not 
under the control of the investigator. Subjects either 
self-select or are naturally exposed to the treatment. 


search investigation. The questions they ask may serve as 
springboards for future projects. Teaching is also a reliable 
source of research questions. Preparing presentations or 
fielding questions from junior medical trainees can help 
define in your own mind what areas require further study. 
Many researchers also find it helpful to attend conferences 
or engage their fellow medical professionals in discussion 
to stimulate new ideas. 


Examples from the Literature: Example of Generating 
a Research Question from Clinical Practice 

You see a 30-year-old man for a follow-up appointment 
in your office. He fractured his distal radius 6 weeks ago 
when he fell off his bicycle. No concerns arise from the 
X-ray or physical examination. You inform the patient 
that he is healing well, and that you will see him again 
in 1 month. 
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The patient looks at you proudly. “Of course I'm doing 
well! My wife just picked up these l-arginine pills from 
the health food store, and I've been taking one a day 
for the past 2 weeks!” 

Because you have adopted an attitude of enlightened 
skepticism, you tell the patient that you are uncertain 
about the benefits of this particular supplement. You 
ask that he stop taking the pills while you investigate 
further. 


Literature Search 


Once an area of research has been chosen, it is important to 
become familiar with the current literature in the field of 
study. A systematic review of the literature will help to es- 
tablish expertise and to clearly delineate where further re- 
search needs to be done. In addition to doing a general lit- 
erature search of your subject area, a more focused search, 
relating directly to your research question, should also be 
done. It is important to phrase your question is such a 
way that it is directly relevant to your patient. A well-for- 
mulated question has four elements: the patient, an inter- 
vention, a comparison, and an outcome (Table 26.1).” You 
must consider all four elements when deciding whether 
results from your literature search (see Chapter 27) are 
applicable to your clinical scenario. 


Key Concepts: Keys to Generating a Good Research 

Question 

e Practice enlightened skepticism 

e Look to clinical practice, meetings, and readings for 
sources of inspiration 

e Remember - a thorough systematic review is part of 
the research process 

e Consider PICO (population, intervention, comparison, 
outcome) when searching for studies that are directly 
applicable to your research question 


Characteristics of a Good Research Question 


Armed with a research idea and having done a thorough 
search of the literature, it is time to formulate the research 
question. The mnemonic “FINER” outlines the characteris- 
tics of good research questions. A research question should 
be feasible, interesting, novel, ethical, and relevant.? 


Feasible 


The first consideration is to determine the number of sub- 
jects that would be needed (refer to Chapter 30 on sample 
size calculations). Many studies are unable to achieve their 
intended purpose because they are unable to recruit a suf- 
ficient number of subjects. It is important to plan for indi- 


Table 26.1 Important Elements of a Focused Clinical 
Question 


Patient How would | describe a group of patients similar to 


mine? 
Are there comorbidities | should be thinking about? 
Is the sex, age, or race of the patient relevant? 





Intervention What main intervention am | considering? 
How can | ensure that all other cointerventions 


were the same between groups? 





Comparison What is the main alternative to compare with the 
intervention? 
Are you comparing treatment to placebo? To cur- 
rent best practice? 

Outcome What can | hope to accomplish, measure, improve, 


or affect? 


What difficulties are associated with assessing 
your chosen outcomes? 


Examples from the Literature: Applying the 

PICO Model 

After some background research on L-arginine, you feel 
ready to ask a more focused question. Using the PICO 
model, you attempt to find articles to address the ques- 
tion: 

e Population: Healthy male adults 

e Intervention: Oral L-arginine supplementation 

e Comparison: Placebo 

e Outcome: Bone healing 


Research Question: In healthy adult males, does oral 
L-arginine supplementation affect bone healing com- 
pared with placebo? 

You find some articles detailing the beneficial effect of 
L-arginine on bone healing in animals, but you are 
unable to uncover any literature investigating this in 
humans. 


viduals who refuse participation, do not fulfill the inclu- 
sion criteria, or become lost to follow-up. It may be neces- 
sary to carry out a pilot study to allow a more accurate pre- 
diction of these variables. If difficulties are encountered 
obtaining enough participants, adjustments to the inclu- 
sion or exclusion criteria could be made, or the recruit- 
ment period could be lengthened. 

It is important to ensure that the investigator has the 
skills and knowledge necessary to plan and carry out the 
project. If such skills are lacking, collaboration with a co- 
investigator can bring additional expertise to the project. 
It is also a good idea to have a statistician involved from 
the beginning of the project. 
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From the outset, an assessment of the necessary time 
commitment should be made, as well as a realistic esti- 
mate of the cost for each component of the study. Bear in 
mind that the need for time and money will likely exceed 
initial projections. Efforts to minimize cost may result in 
fewer or less extensive measurements on fewer subjects. 
The amount of time that an investigator is able to devote 
to research will determine what type of project can realis- 
tically be pursued. As an example, a question that would 
require a long follow-up time with subjects would not be 
appropriate for an investigator looking to complete a pro- 
ject in a brief period. 

It may be tempting to seize the opportunity to gather as 
much data as possible, in the hopes of addressing interest- 
ing side issues. Keep in mind that increasing the scope of a 
study necessitates an increase in both cost and time. It may 
also make increased demands on your study population, 
increasing the dropout rate. You may be better served by 
focusing on the main question at hand. 


Interesting 


Although research may be immensely rewarding, many 
frustrations will occur during the course of a project. A 
genuine interest in the question being asked will provide 
an incentive to overcome these obstacles, as well as pro- 
vide motivation to pursue new ideas and directions that 
arise. It would be wise to discuss your work with other in- 
vestigators in your field to ensure that your research is in- 
teresting to others, before investing too much time in a 
question that your peers may find unimportant. 


Novel 


There is little use in a research study that does not contri- 
bute new information. That being said, a question does not 
need to be completely original to have merit. A study can 
be done to confirm or refute the findings of a previous 
study. This is especially valuable if it avoids the methodo- 
logical shortcomings of earlier work. A study can also be 
done to extend the findings of a previous study. An exam- 
ple is asking whether the findings of one population apply 
to others, or asking whether different outcome measures 
are more appropriate. 


Ethical 


An investigator should take great care to ensure that his or 
her research study does not put its subjects at unreasonable 
risk of physical harm or invasion of privacy. Some research 
questions may be forever unanswerable for ethical reasons, 
but some can be addressed by careful modification of the 
study design. Discuss your project with your institutions’ 
ethics review board early in the planning process. 
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Relevant 


This is the most important element to consider. If answer- 
ing your question would not influence clinical manage- 
ment, health policy, or future research efforts, the project 
is not likely to be worth pursuing. Try to imagine the var- 
ious outcomes that are possible, and ask, “So what?” This 
should give pretty clear guidance regarding the relevance 
of your research question. 


Key Concepts: Characteristics of a Good Research 

Question 

e Feasible: Can you recruit enough subjects? Do you 
have enough time and money? Is the project overly 
ambitious? 

e Interesting: Do you find the subject area exciting? 

e Novel: Does your question add to our current knowl- 
edge/understanding? 

e Ethical: Have you considered the risks to your subjects’ 
person and privacy? 

e Relevant: Does your question pass the “So what?” test? 


Examples from the Literature: Developing a 

Good Research Question 

Consider our current research question: In healthy adult 

males, does oral L-arginine supplementation affect bone 

healing compared with placebo? Does it meet the FINER 
criteria? Why or why not? 

Feasible: Having done a sample size calculation, and es- 
timated the dropout rate to the best of your ability, 
you now have an idea of how many subjects you 
will need. Examine your practice. Can you recruit 
this number of patients in a reasonable amount of 
time? What will it cost to run your study? How will 
the L-arginine supplements be funded? What sort of 
fractures will you be looking at? Too specific (i.e., 
noncomminuted distal radial fractures), and you 
may have trouble finding enough subjects. Too broad 
(i.e., any fracture), and measuring outcomes becomes 
problematic. What will you use to measure out- 
comes? Is this method clinically relevant? Is it objec- 
tive? If not, how will you prevent bias? 

Interesting: This patient population is a source of many 
other research topics, if this one does not excite you. 
The patient will wonder about his rehabilitation regi- 
men, about his pain control. Focus on an area that 
genuinely interests you. 

Novel: As of November 2006, a search on PubMed failed 
to find any studies investigating the impact of L-argi- 
nine on bone healing in humans. It would be worth- 
while to do a more comprehensive search. Be sure to 
check databases of ongoing trials, as well as recent 
conference proceedings. Discuss the idea with an ac- 
tive researcher in the field, to ensure that similar 
work has not already been done. 
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Ethical: Investigate the safety of L-arginine. What dose 
do you intend to give to your subjects? How do you 
know this is a safe dose? Remember to approach 
your institutions ethics board early. L-arginine sup- 
plementation has been given to human patients in 
trials for a variety of other illnesses, so there is good 
reason to believe that its use is not harmful. 

Relevant: If l-arginine did improve bone healing, so 
what? If the effect was significant, it would mean 


Refining a Research Question 


At this point, it is helpful to write down the research ques- 
tion and an outline of your study plan. This will help to 
clarify ideas and provide your colleagues with something 
concrete to base their suggestions on. It is important to 
realize that creating a research question and designing a 
study to answer it are gradual processes. Ideas from your 
colleagues and your own growing knowledge will continue 
to shape your research project as time passes (Fig. 26.1). 

For experimental studies, the research question must be 
restated as a testable statement known as a hypothesis. A 
hypothesis is an assumption of the relationship between 
two or more variables. It represents the researcher’s best 
guess at the outcome of the study. A carefully planned hy- 
pothesis will be easy to understand and testable in terms of 
measurable variables. The assumption that there is no as- 
sociation between the intervention and the outcome is 
termed the null hypothesis. The goal of most research is 
to reject the null hypothesis. 


Jargon Simplified: Null Hypothesis 

A null hypothesis is the prediction that an observed dif- 
ference in outcomes is due to chance alone, and not due 
to the effect of the intervention. 


Finally, it is important to establish a single primary out- 
come at the outset of the study. You may supplement 
this with secondary outcomes, but planning of the study 
and subsequent analysis will be based on the primary out- 
come. Your primary outcome should be clinically relevant, 
and objective if possible. If it is a subjective outcome, steps 
should be taken to avoid bias in assessing the outcome. 
This includes use of third party outcome assessors or an 
adjudication committee. Members of an adjudication com- 
mittee independently assign an outcome, on the basis of 
which a consensus outcome is formulated. This value 
serves as a surrogate gold standard. 


Key Concepts: Generating a Research Question 

e By keeping a “research attitude,” ideas for research 
questions will arise from many day-to-day clinical ac 
tivities. 


less morbidity for patients. Patients would be able to 
start rehabilitation sooner and return to their normal 
activities in less time. 


Although more investigations should be done, it does 
appear that the proposed research question satisfies 
the FINER criteria. 


e A literature search will help to develop expertise in a 
subject area, as well as establish what research has 
already been done. 

e When creating a research question, remember the 
mnemonic FINER, which summarizes the characteris- 
tics of a good question 

e Refining your research question is a gradual process. 

e Give careful consideration to the potential answers 
(outcomes) to your question 
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Fig. 26.1 Summary diagram for the development of a good 
clinical question. Abbreviations: PICO, population, intervention, 
comparison, outcome; FINER, feasible, interesting, novel, ethical, 
relevant. 
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Conclusions 


Maintain an attitude of enlightened skepticism, and re- 
search ideas will come from all aspects of your clinical 
practice. Once you have an idea, it is important to turn to 
the literature, to become familiar with the subject area, 
and to discover what work has already been done. A 
good research question should be feasible, interesting, no- 
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vel, ethical, and relevant. Expect to return frequently to the 
literature when you are developing your research ques- 
tion. You will gradually refine your research question as 
your expertise and knowledge increases. Having an ex- 
perienced researcher as a mentor is an invaluable resource 
at all stages of this process. 
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How to Conduct a Comprehensive Literature Search 


“Search with your eyes, not with your nose!” 


Summary 


This chapter provides a description of the technique used 
to frame a question that facilitates a literature search. 


Introduction 


Traditionally, orthopaedic surgeons have subscribed to 
journals that are appropriate to their field and scanned 
the table of content for relevant or interesting articles. 
With the ever-increasing amount of information sources, 
orthopaedic surgeons must consider a shift in paradigm 
from traditional practice to evidence-based practice. Evi- 
dence-based medicine (EBM) involves question formula- 
tion, literature searches, validity assessment of available 
studies, and appropriate application of research evidence 
to individual patients. Browsing a journal’s table of con- 
tents is no longer a sufficient method of finding important 


Asking Answerable Questions 


Before embarking on a literature search one has to ask an- 
swerable questions.' A good question will help to structure 
your approach. Two types of questions are relevant in this 
respect. 

Medical students or junior doctors generally ask back- 
ground questions, which focus on background informa- 
tion. Traditionally, this type of information can be found 
in textbooks or overview articles. An example is “What is 
osteoarthritis?” This question will yield the possibility for 
a broad spectrum of answers. The answers could include 
concepts of osteoarthritis development, its pathophysiol- 
ogy, incidence, and age or gender distribution. 

Foreground questions, on the other hand, are asked by 
clinicians to help in clinical decision-making. An example 
is “How do I treat a 40-year-old female patient with a mid- 
shaft femur fracture?” Thus, a foreground question is about 
managing patients.' A foreground question narrows down 
the possible answers and is more to the point.” 


Key Concepts: Types of Questions 
e Foreground question 
e Background question 


— Dutch proverb told by mothers to their sons 


Next, common search engines and databases are described 
and tricks to utilize searches are discussed. 


articles. Furthermore, key articles are not always published 
in orthopaedic or surgical journals, but can also be found in 
medical journals. Effective literature searching is a skill 
learned through training and experience. A librarian is a 
well-trained resource in search strategies; it would be ad- 
visable to build a collaboration with your local medical li- 
brarian. Without some guidance, a literature search can 
be frustrating: it can result in numerous irrelevant hits or 
it can fail to spot the most important articles. In this chap- 
ter, guidelines are provided for a comprehensive literature 
search to identify relevant articles. 


To help in ordering your thoughts and thus to formulate an 
answerable question use the PICO format. 


Key Concepts: Remember PICO 

e P: What is the population being studied? 
e J: What is the intervention? 

C: What is the comparison intervention? 
e O: What are the outcomes? 


The PICO format will help you refine your question as is 
shown in Fig. 27.1. 

Framing the question will also elucidate the category of 
study to be found. The study can relate to etiology, diagno- 
sis, therapy, or prognosis and this will also determine 
which terms need to be put in the search. In our example, 
it would not be necessary to put older patients in the 
search since osteoarthritis makes this implicit. 
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The patient-focused question: 

e The patient is a previously healthy 40-year-old, with a 
midshaft femoral fracture. Should | perform reamed nailing 
or should I use an unreamed nail? 


The primary research question: 
e In patients with femoral fractures, does reamed nailing reduce 
the chance of nonunion? 


The secondary research questions: 

e What is the evidence of increasing the chance of ARDS using 
a reamed nail? 

e What are other adverse events of reamed nailing or unreamed 
nailing? 


The search elements (PICO): 

e Population: Middle-aged patients with midshaft 
femoral fractures 

e Intervention: Reamed intramedullary nail 

e Comparison: Unreamed intramedullary nail 

e Outcomes: Nonunion, ARDS, and other adverse events. 











Fig. 27.1 Elaborating and refining a clinical research question. 
ARDS, acute respiratory arrest syndrome. 


Where to Search 


There are currently numerous electronic databases avail- 
able to search and they are relatively easy to search and 
have more current information than most print sources. 
The choice of database will depend on the question to be 
answered, and the time and resources available. 


Key Concepts: Types of Evidence Sources 

e Preappraised: Abstracts or guidelines 

e Summarized: Systematic reviews or meta-analysis 
e Primary studies: Individual studies 


A busy orthopaedic surgeon will use different databases to 
find evidence-based answers in a different way than a re- 
viewer embarking on a systematic review or meta-analy- 
sis. The busy clinician will try to find preappraised or sum- 
marized evidence. The systematic reviewer wants to make 
sure he or she will not miss a relevant article for their re- 
view. All databases are now accessible through the Inter- 
net. You can ask your local medical librarian which data- 
bases are available in your medical library. MEDLINE is 
searchable without charge using the PubMed interface at 
http://www.pubmed.gov. Another free search engine is 
Google’s Scholar at http://www.scholar.google.com. 


Sources of Preappraised Evidence 
The number of complete practical guidelines is increasing, 


although not all guidelines are evidence-based or kept up- 
to-date.’ A simple Google search can reveal sources of EBM 
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Table 27.1 Sources of Systematic Reviews and Preap- 
praised Evidence 


The Cochrane Library at http://www.update-software.com/ 
cochrane or CD-ROM by subscription. Available free to UK NHS 
employees via The National Electronic Library for Health (NeLH) 
Program at http://www.library.nhs.uk/. Abstracts of systematic 
reviews available free at www.update-software.com/cochrane. 
Also in commercial databases, e.g., OVID’s evidence-based 
medicine database 

NHS Centre for Reviews and Dissemination database of abstracts 
of reviews of effectiveness with links to other health technology 
assessment databases - http://crd.york.ac.uk/crdweb/, the phy- 
siotherapy evidence database 

- http://www.pedro.fhs./usyd.edu.au/ 

Trauma Links, the Edinburgh Orthopaedic Trauma Unit site has 
active links to useful orthopaedic trauma Web sites 
-http://www.trauma.co.uk/unit/links.asp 

Owl, Orthopaedic Web Links - http://www.orthopaedicweblinks. 
com/McMaster University, Health Information Research Unit - 
Evidence-based health informatics 

- http://hiru.mcmaster.ca 

SUMsearch, an evidence-based search engine. Search results 
displayed using hierarchy of evidence. Systematic reviews 
(Cochrane) and possible systematic reviews (PubMED) displayed 
after guidelines —http://sumsearch.uthscsa.eduTRIP database, 
evidence-based medical search engine 

- http://www.tripdatabase.com/MEDLINE, EMBASE, CINAHL 
may be able to limit search to systematic reviews or randomized 
controlled trials depending on the interface, otherwise need to 
use search filter. Usually need to run a search to identify trials 

— http://www.york.ac.uk./inst/crd/search.htm 


Source: Adapted from Gillespie LD, Gillespie WJ. Finding current 
evidence: search strategies and common databases. Clin Orthop 
Relat Res 2003;413:133-145. Reprinted by permission. 


Abbreviations: EMBASE, Excerpta Medica database; CINAHL, 
Cumulative Index to Nursing and Allied Health Literature. 


guidelines. Guidelines are often specific to a local or regio- 
nal setting. Often, your local professional association has 
developed guidelines, which can be found at their Web 
site. Clinical guidelines are usually based on systematic re- 
views and randomized controlled trials (RCTs), the highest 
level of evidence. However, the final actual guideline is 
based on consensus meetings. To find the most up-to- 
date evidence, we suggest starting your search for sys- 
tematic reviews or meta-analyses. These address more fo- 
cused questions that are clinically relevant.” If a systematic 
review is not available to answer your question, you should 
proceed to search for applicable RCTs.’ The search engine 
Google at www.google.com can also be used to find addi- 
tional information. Recently, it has been found helpful in 
identifying the correct diagnosis for a disease by searching 
for its symptoms.* However, using a general search engine 
to identify Web sites on a medical topic should be per- 
formed with caution, as the identified Web sites may claim 
results without the support of sound research evidence 
(Table 27.1).° 
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Google Scholar 


With this search engine, you can search across many disci- 
plines and sources: peer-reviewed articles, theses, books, 
abstracts, and conference proceedings from academic pub- 
lishers, professional societies, preprint repositories, uni- 
versities, and other scholarly organizations. This search en- 
gine searches the entire World Wide Web including MED- 
LINE. Google Scholar orders the results according to how 
applicable to the query it considers references to be, taking 
into account the full text of the manuscript, the author, the 
publication in which it appeared, and how often it has been 
cited in other scholarly publications.® It orders the hits in 
order of the most frequently cited, with the most fre- 
quently cited articles on top. More recent articles can be 
found on the bottom. Also, Google Scholar shows how of- 
ten an article is cited. A link is provided to the articles 
that cite the identified article. It is helpful to read the in- 
structions in the help function to be informed about the 
most up-to-date features and optimal use of this search en- 
gine. Advanced search tips are explained in the help sec 
tion as well and can be useful to narrow your results. 


MEDLINE 


MEDLINE is most frequently searched via the free online 
interface PubMed. PubMed is provided by the United 
States National Library of Medicine and the National Insti- 
tutes of Health. Use the Web site’s help function to learn of 


the most up-to-date search features. Instructional videos 
are provided in the tutorial function ranging from basic in- 
structions to more advanced search strategies. The tutor- 
ials are updated frequently and will help you in finding 
the most recent evidence. 

One specific feature is most helpful for busy clinicians. 
For example, to find a relevant meta-analysis without hav- 
ing to read an abundant load of titles you may use the 
“Clinical Queries” feature (arrow in Fig. 27.2). This feature 
uses a programmed search strategy to direct you to EBM 
resources and uses an efficient MEDLINE search strategy 
based on Shojania’s work.’ Once you have clicked on Clin- 
ical Queries, the screen for PubMed Clinical Queries ap- 
pears (Fig. 27.3). To search on our original query related 
to femoral fracture, you go to “Find Systematic Reviews” 
and in the Search box type: femur* OR femoral* AND frac- 
ture* AND ream*. The * makes sure your search will include 
ream, reaming, reamed , and so on. Your search resulted in 
six possible relevant titles (Fig. 27.4). This is called trunca- 
tion or wildcard use.** 


Jargon Simplified: Truncation 

To find articles the search word can be limited to the first 
letters to find as much articles as possible. For example: 
a search for “fractur*” will include fracture, fractures, 
fractured, and fracturing. 


The clinical queries feature can be used to search by clinical 
study category or for systematic reviews. The clinical study 
categories include etiology, diagnosis, therapy, prognosis, 
and clinical prediction guides (Fig. 27.5). As opposed to a 
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PubMed Clinical Queries 


This page provides the following specialized PubMed searches for clinicians: 





After running one of these searches, you may further refine your results using PubMed’s Limits feature. 


Results of searches on these pages are limited to specific clinical research areas. For comprehensive searches, use PubMed directly, 
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basic PubMed search, the programmed search strategy will 
likely result in studies with methodological rigor. A narrow 
or specific search will include less, but methodologically 
sound studies, whereas using the broad or sensitive search 
option will yield more hits including studies with weaker 
methodological constructs. To keep up-to-date and not 
to have to repeat your search you can subscribe to “My 
NCBI” for free. Here you can save your search and activate 
an e-mail system that informs you by e-mail as new articles 
are found by your search strategy. (For handheld computer 
users, a new PICO search system can be found at http:// 
pubmedhh.nlm.nih.gov/nlm/pico/piconew. html) 

MEDLINE can also be searched with OVID, but this search 
interface lacks the unique PubMed features such as “clini- 
cal queries” and it is a paid service. For a sensitive search 
(likely including most articles with less relevant ones as 
well), we suggest using a MeSH (medical subject headings) 
subheadings search. Searches using the systematic reviews 
search option try to include all relevant publications. This 
is an extensive search; hence, multiple databases are in- 
cluded in the search.° 


Jargon Simplified: MeSH 

MeSH is an acronym for Medical Subject Headings. It is 
the U.S. National Library of Medicine's controlled voca- 
bulary used for indexing articles for MEDLINE/PubMed. 
MeSH terminology provides a consistent way to retrieve 
information that may use different terminology for the 
same concepts (http://www.pubmed.gov). 


Another feature unique to PubMed is the related article 
search; which, at times may be helpful to identify addi- 
tional articles. Each article found in your primary search 
has a link to the related articles. Finally, PubMed has the 
capability to find specific citations by author or by journal. 
The left site menu on the PubMed Web site is self-explana- 
tory and opens an abundance of possibilities. 


EMBASE 


Access to the EMBASE (Excerpta Medica database) requires 
a subscription. Please contact your local librarian for de- 
tails how to access the database from your site. EMBASE 
can be searched using the OVID search interface (http:// 
www.ovid.com). There will be an overlap of ~60% com- 
pared with MEDLINE, but your search could find 30% arti- 
cles, that are unique to EMBASE. Therefore, utilization of a 
EMBASE search in addition to a MEDLINE search is strongly 
advocated, especially when embarking on a literature 
search for a systematic review.? Searching MEDLINE, but 
not EMBASE, threatens biasing a meta-analysis by pulling 
studies that show larger estimates of treatment effect.? 
To find methodologically sound trials, specific search 
strategies unique to EMBASE have been designed and 
tested.'°"' At this time, OVID lacks the “Clinical Queries” 


interface used by PubMed to search MEDLINE and finding 
relevant studies is more difficult. 


Jargon Simplified: Search Interface to Search 
Databases 

When searching databases such as MEDLINE or EMBASE, 
an interface will facilitate the search. The interface is an 
online software program developed to search databases. 
OVID software can be used online to search MEDLINE, 
EMBASE, and The Cochrane Library. PubMed is a differ- 
ent interface and can be used to search MEDLINE, but 
not EMBASE. To completely understand the unique fea- 
tures of the search program, use the online help system 
of each search interface. 


Search field tags can be used to limit the hits in a search 
strategy.® The search tags are placed in square brackets be- 
hind a word, thus the search is limited to the specified field. 
Some commonly used tags are listed in Table 27.2. The full 
list and explanations can be found at www.pubmed.gov in 
the appendices of PubMed Help. 

To illustrate the use of tags, we will refer to a MEDLINE 
literature search strategy designed for a systematic review 
below.'? As you can see, search fields are used. To include as 
many relevant titles as possible, truncation is used, for ex- 
ample, randomized controlled trial is truncated down to 
“random$” using the wildcard “$.” To offset missing titles 
not using the word random$, the search also included 
the words “clinical trial.” The number corresponds to 
each search; the history function is used to combine 
searches. The operators (AND, OR, NOT) used need to be 
capitalized. 


Table 27.2 MEDLINE Search Field Descriptions and Tags* 


Affiliation [AD] Pagination [PG] 

All Fields [ALL] Personal Name as Subject [PS] 
Author [AU] Pharmacologic Action MeSH 
Entrez Date [EDAT] Terms [PA] 

Filter [FILTER] Place of Publication [PL] 


Publication Date [DP] 
Publication Type [PT] 
Publisher Identifier [AID] 
Secondary Source ID [SI] 


First Author Name [1AU] 
Full Author Name [FAU] 
Investigator [IR] 

Issue [IP] 


Journal Title [TA] Subset [SB] 
Substance Name [NM] 


Text Words [TW] 

Title [TI] 
Title/Abstract [TIAB] 
Transliterated Title [TT] 
UID [PMID] 

Volume [VI] 


Language [LA] 
Last Author [LASTAU] 
MeSH Date [MHDA] 
MeSH Major Topic [MAJR] 
MeSH Subheadings [SH] 
MeSH Terms [MH] 

NLM Unique ID [JID] 
Other Term [OT] 





Source: Courtesy of the National Library of Medicine.? 
* For details, see Help at http://www.pubmed.gov. 
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To design a search strategy for a systematic review, we 


strongly suggest you collaborate with a librarian regarding 


th 


e complex nature of such a search strategy designed to 


identify all relevant articles without overloading you 
with results. 


Examples from the Literature: Literature Search for 
Metacarpal Neck Fracture in a Cochrane Review 
Source: Poolman RW, Goslings J, Lee J, Statius MM, Stel- 
ler E, Struijs P. Conservative treatment for closed fifth 
(small finger) metacarpal neck fractures. Cochrane Data- 
base Syst Rev 2005;CD0032101. 

1. Randomized controlled trial.pt. 

Controlled clinical trial.pt. 

Randomized Controlled Trials/ 

Random Allocation/ 

Double Blind Method/ 

Single Blind Method/ 

OR/1-6 

Animal/ NOT Human/ 

9. 7 NOT8 

10. Clinical trial.pt. 

11. Exp Clinical Trials/ 


SNAMRWN 


Conclusions 


The question must be framed properly to ensure a success- 


fu 


l literature search. Collaboration with a librarian to de- 


sign and perform a literature search is strongly advised; 
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2. (clinic$ adj25 trials$).tw. 

3.((singl$ or doubl$ or trebl$ or tripl$) adj25 
(blind$ or mask$)).tw. 

14. Placebos/ 

15. placebo$.tw. 

16. random$.tw. 

17. Research Design/ 

18.OR/10-17 

19.18 NOT 8 

20.19 NOT 9 

21. OR/9,20 

22. Metacarpus/ 

23. boxer$ fracture$.tw. 

24. little finger$.tw. 

25. metacarp$.tw. 

26. (fifth adj3 finger$).tw. 

27. OR/22-26 

28. Fractures/ 

29. fracture$.tw. 

30. OR/28-29 

31. AND/27,30 

32. AND/21,31 


= m 


e need for such a collaboration parallels that between a 


surgeon and an anesthesiologist during surgery. 
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Guide to Planning a Randomized Trial 


“Nothing will work unless you do.” 


Summary 


The key considerations when planning a surgical rando- 
mized controlled trial are the focus of this chapter. These 
include the standardization of the surgical protocol, the 
importance of blinding, limiting loss to follow-up, imple- 


Introduction 


In a setting where a randomized controlled trial (RCT) is 
ethical and feasible, it is the optimal design for determin- 
ing treatment effectiveness when compared with observa- 
tional methods. The major advantage of a RCT is its ability 
to limit bias. Orthopaedic surgeons often face several diffi- 
culties when designing RCTs, and they may encounter un- 
expected challenges as the trial progresses. As a result, few 
surgical studies in orthopaedic trauma are designed as 
RCTs, and many that are RCTs have been limited by meth- 
odological deficiencies. In this chapter, a guide to planning 
aRCT in orthopaedic surgery is provided. The following as- 
pects are addressed: standardization of the surgical proto- 
col, the importance of blinding, limiting loss to follow-up, 
implementing the intention-to-treat principle, randomi- 
zation, block and centralized randomization, and trial or- 
ganization. Methodological challenges also must be recog- 
nized and addressed to increase the quality of RCTs in 


Standardization of the Surgical Protocol 


The individual skill of an orthopaedic surgeon participat- 
ing in a RCT may influence the trial’s outcome. Operations 
are complex procedures and clinical competence in a sur- 
geon’s performance requires frequent repetition over 
time. Consequently, there is a learning curve for surgeons 
who are performing new procedures; it is during this 
learning curve that adverse events due to error are more 
likely to occur.! A further complication is that modifica- 
tions to the technique are frequently made at its incep- 
tion.” As with any type of research study, it is important 
to standardize the protocol for both the treatment 
group(s) and the control group in an attempt to reduce 
bias due to the variation in the participating surgeons’ 
techniques and skills. 


— Maya Angelou 


menting the intention-to-treat principle, randomization, 
block and centralized randomization, and trial organiza- 
tion. 


orthopaedic surgery. In addition, several items to consider 
when setting up a randomization system are described. 


Key Concepts: Randomized Controlled Trials 

The randomized controlled trial stands as the most rig- 
orous means for answering questions related to the ef- 
fects of preventive or therapeutic interventions because 
of its ability to limit bias. It is possible for well-designed 
and well-conducted randomized controlled trials to de- 
monstrate whether a new intervention is more effective 
and safer than an established intervention. Although 
there are many challenges when conducting rando- 
mized controlled trials in the specialty of orthopaedic 
trauma surgery, with appropriate knowledge and re- 
sources orthopaedic surgeons can conduct higher-qual- 
ity randomized controlled trials. 


Several solutions have been suggested in the general 
surgical literature to limit this type of bias. Both Buchwald 
and MacLeod proposed using only the best surgeons and 
training them to use a uniform protocol for the operation 
that is being tested.?? The advantage of this strategy is 
that it is likely to increase the uniformity of the technique, 
which consequently increases the likelihood of observing a 
treatment effect. However, the disadvantage is that the 
results may not be generalizable to other surgeons. 

Chalmers suggested that RCTs using new operations 
should begin with the first patient who receives the new 
procedure, mostly due to ethical concerns.* However, 
MacLeod argues that if the first patients who receive a sur- 
gical technique are included in the trial, bias would likely 
result against the new procedure.” In contrast, others 


www.urdukutabkhanapk.blogspot.com 


believe that the learning curve needs to be recognized 
and evaluated using appropriate statistical techniques.!? 
Authors have suggested several other methods of standar- 
dizing the protocol including: (1) ensuring that all partici- 
pating surgeons agree on how the procedure should be per- 
formed, (2) holding teaching sessions prior to the trial, (3) 
auditing surgical performance throughout the trial, and (4) 
stratifying patients by surgeon at the time of randomiza- 
tion.” Stratifying by surgeon will not eliminate the variation 
in how a procedure is performed, but it will ensure that 
there is not an imbalance between groups.° One of the lim- 
itations of stratifying by surgeon in a surgical trial is that it 
assumes that each surgeon’s skill level is the same for both 
procedures, which is often an unlikely situation. 

Similarly, instead of randomizing patients to operations, 
patients could be randomized to surgeons who would per- 
form their operation of preference." This is referred to as 
an expertise-based trial and it is discussed in greater detail 
below. 


Blinding or Masking 


Blinding or masking refers to any attempt by the investiga- 
tors to keep one or more of the individuals involved in the 
trial unaware of the intervention.’ The purpose of blinding 
is to reduce the risk of ascertainment or observational bias. 
This bias is present when the outcome assessor is aware of 
which intervention the patient is receiving; consequently, 
outcomes assessment may be influenced by this knowl- 
edge. In addition, if the treating physicians are aware of 
which treatment a patient is receiving, the care they deliver 
may also be different which can influence the outcome. The 
best way to avoid the psychological impact of treatment 
(placebo effect) is to ensure that the patient is unaware 
of the treatment they receive.’ Differences in patient care 
other than the intervention under study can bias the results 
of a study, which makes it important to blind the clinician.’ 

Blinding can potentially be implemented at multiple le- 
vels in a RCT: (1) the patient, (2) the clinicians who admin- 
ister the treatment, (3) the clinicians who take care of the 
patients during the trial, (4) the individuals who assess the 
patients’ throughout the trial and collect the data, (5) the 
data analyst, and (6) the investigators who interpret and 
write the results of the trial.'° In addition, it is always pos- 
sible to blind the members of the adjudication committee 
who may assess the patient’s eligibility, protocol violations, 
and study events. The role of the adjudication committee is 
discussed in detail in chapter 38. 


Key Concepts: Who Can Potentially be Blinded in an 

Orthopaedic Randomized Controlled Trial’? 

1. The patient 

2. The clinicians who administer the treatment 

3. The clinicians who take care of the patients during 
the trial 
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Jargon Simplified: Stratified Randomization 
“Stratified randomization is used to keep certain charac 
teristics of the participants (that is, age, weight, or func 
tional status) as similar as possible across treatment 
groups. To achieve this, the investigators must first iden- 
tify factors (or strata) that are known to be related to the 
outcome of the study. Once the factors are identified, the 
next step is to produce a separate block randomization 
scheme for each factor to ensure that the groups are ba- 
lanced within each strata.” 


Key Concepts: Standardization of the Surgical Protocol 

1. Ensure that all participating surgeons agree on how 
the procedure should be performed 

2. Hold teaching sessions prior to the trial 

Audit surgical performance throughout the trial 

4. Consider stratifying patients by surgeon at the time 
of randomization 

5. Consider using an expertise-based design 


w 


4. The individuals who assess the patients’ throughout 
the trial and collect the data 

The data analyst 

6. The adjudication committee members 

The investigators who interpret and write the results 
of the trial 


gi 


x 


In orthopaedic surgical trials, it is possible to blind in trials 
where the experimental intervention is a drug such as a 
thromboprophylaxis; however, it is more difficult to blind 
those orthopedic surgical trials where the experimental 
intervention is a surgical procedure or device. Blinding is 
particularly difficult in orthopaedic trauma surgical trials, 
and very few trials have successfully blinded both sur- 
geons and patients.!™® In trials comparing medical and 
surgical therapies, blinding is likely impossible because 
sham operations are generally unethical and unpalatable 
to patients.” Even in trials comparing two different surgical 
operations, there may be difficulty blinding patients and 
outcome assessors if the incisions differ or the operations 
differ in magnitude. 

Consequently, many surgical and trauma trials are re- 
ferred to as “open randomized controlled trials,” as every- 
body involved in the trial is aware of which intervention 
the patient received.® In some circumstances, using crea- 
tive measures, it is possible to ensure that outcome asses- 
sors or data collectors, and possibly the patient, are blinded 
to the patients’ treatment allocation.' In addition, even in 
open RCTs, it is possible to blind the individuals who adju- 
dicate outcomes, the data analysts, and the investigators 
who write the results of the trial. 
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Jargon Simplified: Open Randomized Controlled Trials 
Open randomized controlled trials occur when every- 
body involved in the trial, including the investigators, 
physicians, research participants, etc., are aware of 
which intervention the patient received.® 


Arecent review of orthopaedic surgical RCTs found the fol- 
lowing: (1) it was not possible to blind the surgeon in any 
of the surgical RCTs; (2) the patient could be blinded in 87% 
of the RCTs; (3) the outcome assessor (or data collector) 
could be blinded for all outcomes in 70% of the RCTs and 
for some outcomes in 90% of the RCTs; and (4) the data ana- 
lyst could be blinded in 100% of the RCTs.!? This study did 
not measure whether members of an adjudication com- 
mittee could be blinded. In orthopaedic trauma trials 
that are drug trials, it is possible to blind the surgeon, the 


Limiting Loss to Follow-Up 


Given that RCTs in orthopaedic surgery have typically re- 
ported 10% (but up to 30%) loss to follow-up of included 
patients, it is especially important for orthopaedic re- 
searchers to strive to limit loss to follow-up when planning 
a RCT.!"4 In addition, orthopaedic surgery trials often in- 
volve lengthy follow-up periods, from several months to 
several years. 

It is extremely important to consider strategies for redu- 
cing loss to follow-up when planning a RCT. Sprague et al 
proposed several strategies for reducing loss to follow-up 
including: (1) excluding individuals who are likely to pre- 
sent problems in follow-up (however, this may decrease 
the generalizability of the study); (2) at the time of rando- 
mization, have the patient provide their own address and 
phone number, the name and address of their primary 
care physician, and three people at different addresses 
with whom the patient does not live; (3) provide partici- 
pants with information on their injury, their complica- 
tions, potential treatment effects, expectations for perso- 


The Intention-to-Treat Principle 


Randomization can accomplish the goal of balancing 
groups with respect to both known and unknown determi- 
nants of outcomes only if patients are analyzed in the 
groups to which they are randomized.'° Analyzing patients 
based upon the treatment they actually receive can de- 
stroy the prognostic balance of randomization because 
the reasons that patients do not take their medication or 
do not receive a particular surgical intervention are often 
related to prognosis. Excluding noncompliant patients 
from the analysis may remove a group of patients with a 
worse prognosis, and the remaining patients may be des- 
tined to have a better outcome. Therefore, removing non- 


outcome assessor, and the data analyst in 100% of the 
RCTs and the patient in 96% of the RCTs.'” 

A review of nonsurgical RCTs published in the orthopae- 
dic literature (trials that did not involve a surgical proce- 
dure such as stimulation modalities, physiotherapy treat- 
ments, and medical devices), found the following: (1) the 
surgeon could be blinded in 56% of the RCTs; (2) the patient 
could be blinded in 44% of the RCTs; (3) the outcome asses- 
sor could be blinded for all outcomes in 88% of the RCTs and 
for some outcomes in 100% of the RCTs; and (4) the data 
analyst could be blinded in 100% of the RCTs.! 


Key Concepts: Blinding 

It is possible to blind at least some of the individuals in- 
volved in an orthopaedic trauma trial as a means of 
reducing bias. 


nal benefit from study participation, and motivation for 
adherence with follow-up visits and research protocols; 
(4) provide patients with reminders for upcoming clinic 
visits; (5) have the trial’s follow-up schedule coincide 
with normal surgical follow-up clinic visits; (6) have study 
personnel contact patients frequently to maintain contact 
and obtain information about any planned change in resi- 
dence; and (7) if a patient refuses to return for a follow-up 
assessment, determine the patient’s status by telephone 
contact with the patient or the patient’s family physician." 
These strategies should be clearly outlined in both the 
study protocol and the research coordinators manual. 


Key Concepts: Loss to Follow-Up 

Strategies to reduce loss to follow-up should be deter- 
mined when planning a randomized controlled trial 
and be included in both the study protocol and the 
research coordinators manual." 


compliers may destroy the unbiased comparison provided 
by randomization? 

Intention to treat is a strategy for analysis of RCTs that 
compares patients in the groups to which they were ori- 
ginally assigned. This is often interpreted as including all 
patients, regardless of whether they actually satisfied the 
entry criteria, received the treatment to which they were 
randomly allocated, subsequently withdrew from the 
study, or deviated from the protocol.'”? However, Fergusson 
et al argue that excluding patients from the primary analy- 
sis may be legitimate when study personal made errors in 
the implementation of eligibility criteria or patients never 
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received the interventions." In these cases excluding pa- 
tients does not introduce bias and may lead to a more in- 
formative analysis if an independent, blinded adjudication 
committee makes this determination after evaluating all 
randomized patients.'® Investigators should not exclude 
patients from the intention-to-treat analysis if the treat- 
ment could have influenced the ultimate decision regard- 
ing exclusion, as may occur with excessively broad eligibil- 
ity criteria.'® 


Jargon Simplified: Intention to Treat 

Intention to treat is a strategy for the “analysis of out- 
comes based on the treatment arm to which patients 
were randomized, rather than which treatment they ac- 
tually received.”!® 


The intention-to-treat approach helps to preserve prog- 
nostic balance in the study arms, emphasizes greater ac 
countability for all patients entered into the study, and 
consequently minimizes the influence of withdrawals, 
noncompliers, and patients lost to follow-up.'*"° The in- 
tention-to-treat strategy also allows for greatest generaliz- 
ability of the study results. In addition, because an inten- 
tion-to-treat analysis is the most cautious approach to 
take, it minimizes the risk of a type I error.'® However, 
critics of the intention-to-treat strategy argue that such 
an analysis is less likely to show a positive treatment effect, 
especially in studies that randomize patients who have 
little or no chance of benefiting from the intervention.'® 


Jargon Simplified: Type | Error 

Type I error (Alpha error) or occurs when the null hy- 
pothesis is concluded to be false when it is true. The re- 
searcher concludes there to be a difference or treatment 


Randomization 


Randomization introduces a deliberate element of chance 
into the assignment of treatments to subjects in a clinical 
trial. During subsequent analysis of the trial data, it provides 
a sound statistical basis for the quantitative evaluation of 
the evidence relating to treatment effects. It also tends to 
produce treatment groups in which the distributions of 
prognostic factors, known and unknown, are similar. In 
combination with blinding, randomization helps to avoid 
possible bias in the selection and allocation of subjects aris- 
ing from the predictability of treatment assignments.'*'® 


Jargon Simplified: Randomization 

Randomization is the “allocation of participants to 
groups by chance, usually done with the aid of a table 
of random numbers. Not to be confused with systematic 
allocation (e.g., on even and odd days of the month) or 
allocation at the convenience or discretion of the inves- 
tigator.”!° 
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effect when there is not. Alpha refers to the probability 
of committing such an error, and by convention is 
usually set at 0.05. 


Crossovers pose a serious problem in the analysis of data. If 
data are analyzed according to the original assignment, 
there may be some patients who received treatment B in 
the treatment A group, and there may be some patients 
who received treatment A in the treatment B group. How- 
ever, if data are analyzed according to the treatment that 
patients actually receive, randomization is broken. Current 
practice is to analyze data according to the original rando- 
mization; however, if there are many crossovers, the 
meaning of the study results will be questionable.® Because 
there are no perfect solutions, the number of crossovers 
must be kept to a minimum. 

When planning a RCT, methods for reducing crossovers 
should be discussed and strategies detailed in both the 
study protocol and the manual of operations. These in- 
clude providing clear inclusion and exclusion criteria, ran- 
domizing patients as close as possible to surgery, and en- 
suring that all participating surgeons are comfortable per- 
forming both techniques. 


Key Concepts: Methods for Reducing Crossovers in a 

Surgical Trial 

e Have clear inclusion and exclusion criteria 

e Explain the treatment options clearly to study partici- 
pants 

e Randomize patients as close to surgery as possible 

e Ensure that the attending surgeon is willing to per- 
form either procedure on patient 

e Consider using an expertise-based design 


The randomization treatment schedule of a clinical trial 
documents the random allocation of treatments to partici- 
pants. In the simplest situation it is a sequential list of 
treatments (or treatment sequences in a crossover trial) 
or corresponding codes by study identification number. 
The logistics of some trials, such as those with a screening 
phase, may make matters more complicated, but the un- 
ique preplanned assignment of treatment, or treatment 
sequence, to subject should be clear. Different trial designs 
will necessitate different procedures for generating rando- 
mization schedules. The randomization schedule should 
be reproducible. 

Although unrestricted randomization is an acceptable 
approach, some advantages can generally be gained by 
randomizing research participants in blocks.'® This helps 
to increase the comparability of the treatment groups, par- 
ticularly when research participant characteristics may 
change over time, as a result, for example, of changes in 
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recruitment policy. It also provides a better guarantee that 
the treatment groups will be of nearly equal size.’ In 
crossover trials, it provides the means of obtaining ba- 
lanced designs with their greater efficiency and easier in- 
terpretation. Care should be taken to choose block lengths 
that are sufficiently short to limit possible imbalance, but 
that are long enough to avoid predictability toward the 
end of the sequence in a block.'® Investigators and other re- 
levant staff should generally be blind to the block length; 
the use of two or more block lengths, randomly selected 
for each block, can achieve the same purpose.!® 


Block Randomization 


In multicenter trials, the randomization procedures should 
be organized centrally. It is advisable to have a separate 
random scheme for each center, i.e., to stratify by center 
or to allocate several whole blocks to each center.'? More 
generally, stratification by important prognostic factors 
measured at baseline (e.g., severity of disease, age, sex) 
may sometimes be valuable to promote balanced alloca- 
tion within strata; this has greater potential benefit in 
small trials." The use of more than two or three stratifica- 
tion factors is rarely necessary, is less successful at achiev- 
ing balance, and is logistically troublesome.'? Factors on 
which randomization has been stratified should be ac 
counted for later in the analysis.!® 


Jargon Simplified: Stratification 

Stratification means that the person or computer rando- 
mizing patients to each group tries to assign roughly 
equal numbers of patients with similar baseline or 
health characteristics to each type of treatment. The fac- 
tors to be stratified for are identified from the outset of 
the trial, as part of the study protocol. Stratification is 
done to help ensure that true conclusions can be made 
about the usefulness and safety of each treatment being 
studied. 


The next research participant to be randomized into a trial 
should always receive the treatment corresponding to the 
next free number in the appropriate randomization sche- 
dule (in the respective stratum, if randomization is strati- 
fied).'° The appropriate number and associated treatment 
for the next research participant should only be allocated 
when entry of that research participant to the randomized 
part of the trial has been confirmed.'® Details of the rando- 
mization that facilitate predictability (e.g., block length) 
should not be contained in the trial protocol, to ensure 


that concealment is obtained. The randomization schedule 
itself should be filed securely by the sponsor or an inde- 
pendent party in a manner that ensures that blindness is 
properly maintained throughout the trial.'® Access to the 
randomization schedule during the trial should take into 
account the possibility that, in an emergency, the blind 
may have to be broken for any research participant in a 
pharmaceutical trial. The procedure to be followed, the ne- 
cessary documentation, and the subsequent treatment 
and assessment of the research participant should all be 
described in the protocol. 


Centralized Randomization 


To ensure concealed treatment allocation (i.e., the inability 
of participating investigators to determine the treatment 
allocation of the next enrolled patient into a trial), a cen- 
tralized randomization system is optimal.’ Many multi- 
center trials randomize patients by envelopes distributed 
to each center; however, envelopes are not tamper-proof 
and do not ensure concealed randomization. A 24-hour re- 
mote randomization system is the best method to conceal 
randomization in multicenter trials. This can be achieved 
by 24-hour research pager system at the methods center 
or by computerized telephone or Internet randomization 
systems. 

A central computer-automated randomization system 
allows study sites to enroll patients in clinical trials using 
either the telephone or the Internet 24 hours a day, 7 
days a week. Telephone users are assigned a site-specific 
telephone access code, which allows them to phone into 
the system to randomize patients. Callers are greeted by 
a computer-generated voice that guides them through 
the process, similar to telephone banking. Callers enter 
data that is relevant to the randomization process using 
the number pad of their touch-tone phone. Once all ques- 
tions have been answered, the computer system performs 
the randomization and assigns the patient to a treatment 
group. A computer-generated voice then tells the caller 
to which treatment group he or she has been assigned. Si- 
milarly, online users are assigned a username and pass- 
word to gain access to the Web interface for randomiza- 
tion. Users are required to enter data that is relevant to 
the randomization process via the Web page. Once all 
questions have been answered, the computer system per- 
forms the randomization and assigns the patient to the 
treatment group. The treatment allocation is displayed 
on the Web page for the user. 
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Trial Organization 


When planning a randomized trial, it is important to con- 
sider how it will be organized prior to starting. An advan- 
tage to conducting a multicenter randomized trial is that it 
increases the number of patients available to participate in 
the trial and, as a result, decreases the amount of time re- 
quired to complete an accrual target. Another advantage of 
a multicenter trial is that it allows orthopaedic surgeons 
with similar research interests to meet, exchange ideas, 
and pursue further collaborations with the aim of improv- 


Conclusions 


This chapter has discussed several important items to con- 
sider when planning a RCT. Taking the time to plan the 
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ing surgical care to trauma patients.” In addition, multi- 
center trials improve the generalizability of a study. A po- 
tential disadvantage of conducting a multicenter RCT and 
using a central methods center is the increased cost. 
When applying for funding, orthopaedic traumatologists 
need to include the cost of staffing and maintaining a cen- 
tral methods center in their applications. These items, in- 
cluding the role of the methods center and budgeting, 
are discussed in Chapter 31. 
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lead to a higher-quality trial. 
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“Somewhere, something incredible is waiting to be known.” 


Summary 


Key issues in designing nonrandomized studies in surgery 
are reviewed in this chapter. Types of nonrandomized stu- 
dies, including cohort, case-control, and cross-sectional, 
are described along with their respective strengths and 
weaknesses; examples of each are provided. Additionally, 


Introduction 


In discussing research designs, we often pay a great deal of 
attention to the planning of randomized studies and less 
attention to nonrandomized study designs. As there are 
circumstances when conducting a nonrandomized study 
may be preferable, or when a randomized study is not pos- 
sible, it is important to understand nonrandomized study 
designs.! 

Nonrandomized studies are often used to provide preli- 
minary evidence on a topic that can be useful in planning 
future randomized studies. For example, we can gain infor- 
mation from existing, large, administrative databases. Re- 
sources for conducting a randomized study may be limited, 
so we may consider conducting a nonrandomized study to 
begin to understand a topic. Other reasons for choosing a 
nonrandomized study are to examine rare outcomes, the 
effects of harmful exposures, or the long-term effects of 
an exposure. 


Nonrandomized Studies 


Nonrandomized studies include various types of epide- 
miological studies that are not randomized controlled 
trials. These include observational studies and experimen- 
tal studies, called controlled clinical trials that do not use 
randomization. The use of nonrandom allocation methods 
in controlled clinical trials are problematic in that it can in- 
troduce systemic differences between comparison groups 
that bias the results.? 


Jargon Simplified: Nonrandomized Trial 

A nonrandomized trial is an “experiment in which as- 
signment of patients to the intervention groups is at 
the convenience of the investigator or according to a pre- 


— Carl Sagan 


a comparison of study designs is given, focusing on the 
role of each type of study design. Methods are shared on 
how to ensure the strength of your study design, and the 
potential limitations in interpreting the results of these 
studies are discussed. 


Each type of design has its own role, as well as strengths 
and weaknesses. In nonrandomized studies, we can con- 
trol for known factors other than the exposure of interest 
that may be related to the outcome. However, these studies 
do not benefit from the ability of randomization to balance 
unknown factors between comparison groups. These un- 
known factors can introduce bias into the study, and limit 
our interpretation of the study results. 

It is important to understand both randomized and non- 
randomized studies in choosing the most appropriate 
study design for a particular research question. Nonrando- 
mized studies are experiments in which assignment of pa- 
tients to the intervention group is at the convenience of the 
investigator or according to a preset plan that does not con- 
form to the definition of random.’ In this chapter, we focus 
on three types of nonrandomized study designs commonly 
reported in the medical literature. 


set plan that does not conform to the definition of 
random.” 


Observational studies are classified into descriptive and 
analytic studies. Descriptive studies typically report on 
patterns of a disease occurrence, whereas analytic studies 
test hypotheses on the relationship between various fac 
tors and health status.*° Analytic designs include cohort, 
case-control, and cross-sectional studies.*° This discus- 
sion will focus on these three designs (Fig. 29.1). Table 
29.1 summarizes the various features of these designs. 
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Table 29.1 Comparison of Study Designs 















































Cross-sectional Case-Control Cohort 
Suitable for: 
Investigation of rare disease - FEFEFE - 
Investigation of rare cause - - G 
Testing multiple effects of cause ++ - tttt+ 
Study of multiple exposures and determinants ++ +t+++ +++ 
Measurement of time relationship - +a +++++ 
Direct measurement of incidence - +b +++++ 
Investigation of long latent periods - EF - 
Probability of: 
Selection bias edium High Low 
Recall bias High High Low 
Loss to follow-up Not applicable Low High 
Confounding edium Medium Low 
Time required Medium Medium High 
Cost edium Medium High 


Source: From Beaglehole R, Bonita R, Kjellstrom T. Basic Epidemiology. Geneva, Switzerland: World Health Organization; 1993. Reprinted 
by permission. Abbreviations: +, degree of suitability; -, not suitable; a, if prospective; b, if population-based. 


is usually considered the most definitive type of observa- 

Jargon Simplified: Observational Studies tional study. A retrospective cohort compares groups of 
In observational studies, exposure to treatment is not exposed and unexposed individuals from the time ofexpo- 
under the control of the investigator. Subjects either sure up to the present to determine outcome rates.° The 
self-select or are naturally exposed to the treatment. retrospective design may be limited by the availability of 
information on exposure and other variables that may be 


Key Concepts: Types of Observational Analytical relevant.° The remaining discussion will focus on the pro- 
Studies spective cohort design because it is considered a stronger 
e Cohort study design than a retrospective cohort (Fig. 29.2). 

e Case-control study 

e Cross-sectional survey Jargon Simplified: Cohort Study 


A cohort study is a “prospective investigation of the fac 
tors that might cause a disorder in which a cohort of in- 















































Cohort Study dividuals who do not have evidence of an outcome of in- 

terest but who are exposed to the putative cause are 

A cohort study is a type of analytic, observational epide- compared with a concurrent cohort who are also free 

miologic study that is used to compare the rates of an out- of the outcome but not exposed to the putative cause. 

come of interest between groups that are exposed or not Both cohorts are then followed to compare the incidence 
exposed to a factor of interest.° A prospective cohort study of the outcome of interest.” 

Analytical When conducting a prospective cohort study, one begins 

studies : eects š 

with a group of individuals who do not have evidence of 

the outcome of interest. This group is further classified 

into those who are exposed to the factor of interest and 

Cross-sectional Case-control Cohort those who are not. The cohort is then followed over time 

comparative studies studies to determine the rate of the outcome of interest among 

studies 

those exposed and unexposed. 











Fig. 29.1 Types of observational studies incorporating an analy- 
tical design. 
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Fig. 29.2 The design of a cohort 
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Examples from the Literature: An Example of a Cohort 
Study 

Source: Geubbels EL, Wille JC. Nagelkerke NJ, Vanden- 
broucke-Grauls CM, Grobbee DE, de Boer AS. Hospital- 
related determinants for surgical-site infection following 
hip arthroplasty. Infection Control & Hospital Epidemiol- 
ogy 2005; 26(5):435-441.° 

Abstract 

Objective: To determine hospital-related risk factors for 
surgical-site infection (SSI) following hip arthroplasty. 
Design: Prospective, multicenter cohort study based on 
surveillance data and data collected through a struc 
tured telephone interview. With the use of multilevel lo- 
gistic regression, the independent effect of hospital- 
related characteristics on SSI was assessed. 

Setting: Thirty-six acute care hospitals in the Dutch sur- 
veillance network for nosocomial infections (PREZIES), 
from 1996 to 2000. 

Patients: Thirteen thousand six-hundred eighty pa- 
tients who underwent total or partial hip arthroplasty. 
Results: A high annual volume of operations was asso- 
ciated with a reduced risk of SSI (risk-adjusted risk ratio 
[RR] per 50 extra operations, 0.85; 95% confidence inter- 
val [CI95], 0.74-0.97). With each extra full-time-equiva- 
lent infection control staff member per 250 beds avail- 
able for prevention of SSI, the risk for SSI was decreased 
(RR, 0.48; C195, 0.16-1.44), although the decrease was 
not statistically significant. Hospital size, teaching sta- 
tus, university affiliation, and number of surgeons and 
their years of experience showed no important associa- 
tion with the risk of SSI. 

Conclusion: Undergoing surgery in a hospital with a low 
volume of operations increases a patient's risk of SSI. 


The prospective cohort design is similar to a randomized 
controlled trial in that groups exposed and unexposed to 
a factor of interest are followed through time to determine 
the rates of an outcome of interest. However, the main dif- 
ference between the designs is that allocation to the expo- 
sure or intervention is based on chance in a randomized 
controlled trial, and controlled by investigators.’ The pur- 
pose of randomization is to control for both measured 














Health Organization; 1993. 


: Reprinted by permissjon. 
and unmeasured baseline factors that thay In onde the 


outcome of interest.” 

In contrast, allocation to an exposure in a prospective co- 
hort study is based on decisions made by providers or pa- 
tients.’ Because allocation is not based on chance, the com- 
parison groups may differ in characteristics at baseline that 
have independent effects on the outcome of interest.’ These 
are called confounding factors or confounders. If potential 
confounders have only a weak effect on the outcome or are 
distributed evenly between groups, they will introduce less 
bias.® Confounding due to known factors can be assessed or 
reduced by using the analytical techniques of regression 
and stratification, which are described elsewhere.’ 


Jargon Simplified: Confounding Variable 

A confounding variable is a factor that distorts the true 
relationship of the study variable of interest by virtue 
of also being related to the outcome of interest. Con- 
founding variables are often unequally distributed 
among groups being compared.” 


The term selection bias refers to an error in selecting com- 
parison groups that causes differences in prognostic fac 
tors between the groups. In other words, selection bias 
can lead to confounding.’ To minimize selection bias, in- 
vestigators can restrict the cohort to include only indivi- 
duals with a specific diagnosis or characteristics.” How- 
ever, these restrictions reduce sample size and make the 
findings less generalizable.” 


Jargon Simplified: Selection Bias 

Selection bias is “a systematic error in creating interven- 
tion groups, causing them to differ with respect to prog- 
nosis. That is, the groups differ in measured or unmea- 
sured baseline characteristics because of the way in 
which participants were selected for the study or as- 
signed to their study groups.”'° 


Key Concepts: Key Steps in Conducting a Surgical 
Cohort Study 

1. Assemble the cohort 

2. Determine exposure status 

3. Measure outcomes 
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Critical Appraisal of Cohort Studies 


Reader’s guides have been published in the British Medical 
Journal to assist readers with critical appraisal of cohort 
studies.’ 


Examples from the Literature: Critical Appraisal of a 
Cohort Study 
When assessing a cohort design: 
1. What comparison is being made? 
e Clear definition of why the two groups were se- 
lected and how they were defined 
2. Does the comparison make clinical sense? 
e Discussion of the clinical context and justification 
for the comparison 
3. What are the potential selection biases? 
e Carefully consider possible sources of bias’ 


When assessing the potential for confounding: 
1. Has there been a systematic effort to identify and 
measure potential confounders? 

e Comprehensive review of the literature to identify 
factors, including those related to demographics, 
social characteristics, medical history, and expo- 
sure to drugs 

2. Is there information on how the potential confoun- 
ders are distributed between the comparison 
groups? 

e Provide a table on potential confounders for the 
two comparison groups 

3. What methods are used to assess differences in the 
distribution of potential confounders? 

e Statistical significance, standardized differences, 
or multivariate assessment? 


When assessing analytical strategies to reduce con- 
founding: 
1. Are the analytic strategies clearly described? 
e Describe which strategy was used and how con- 
founders were incorporated 
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2. Do different analytical strategies used yield consis- 
tent results? 
e Review limitations, advantages, and assumptions 
of each strategy if different 
e Compare adjusted and unadjusted estimates of the 
effect 
3. Are the results plausible? 
e Place results in context of other similar studies? 


Case-Control Study 


A case-control study of a surgical intervention selects a 
group of people with an outcome of interest and a control 
group of people without the outcome of interest, and com- 
pares the proportions in each group exposed to a factor of 
interest (Fig. 29.3).*° This design is best suited to rare out- 
comes for which individuals are likely to seek care, and for 
diagnoses that occur within a relatively short time after the 
onset of symptoms.° 


Jargon Simplified: Case-Control Study 

A case-control study is “a study designed to determine 
the association between an exposure and outcome in 
which patients are sampled by outcome (that is, some 
patients with the outcome of interest are selected and 
compared with a group of patients who have not had 
the outcome), and the investigator examines the propor- 
tion of patients with the exposure in the two groups.” 


Cases can be identified by using established registries, in 
hospitals, or physicians’ offices, or through the community 
by contacting patient organizations, schools, workplaces, 
or the military. Ideally, controls should be similar to cases 
in that they are individuals that would have been selected 
as cases if they had developed the outcome of interest.° For 
example, controls can be chosen from random samples of 
the population from which cases would have arisen, or 
people seeking care at the same institution for unrelated 
diagnoses. The choice of control group is a crucial decision, 
and it is important to avoid confounding factors (i.e., other 
factors related to the outcome of interest). 
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Fig. 29.3 The design of a case- 
control study. (From Beaglehole 
R, Bonita R, Kjellstrom T. Basic 
Epidemiology. Geneva, Switzer- 
land: World Health Organization; 
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1993. Reprinted by permission.) 
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Examples from the Literature: An Example of a 
Case-Control Study 

Source: Silber JH, Rosenbaum PR, Trudeau ME, et al. Pre- 
operative antibiotics and mortality in the elderly. Annals 
of Surgery 2005; 242(1):107-14."! 

Abstract 

Objective and Background: It is generally thought that 
the use of preoperative antibiotics reduces the risk of 
postoperative infection, yet few studies have described 
the association between preoperative antibiotics and 
the risk of dying. The objective of this study was to deter- 
mine whether preoperative antibiotics are associated 
with a reduced risk of death. 

Methods: We performed a multivariate matched, popu- 
lation-based, case-control study of death following sur- 
gery on 1362 Pennsylvania Medicare patients between 
65 and 85 years of age undergoing general and orthope- 
dic surgery. Cases (681 deaths within 60 days from hos- 
pital admission) were randomly selected throughout 
Pennsylvania using claims from 1995 and 1996. Models 
were developed to scan Medicare claims, looking for 
controls who did not die and who were the closest 
matches to the previously selected cases based on preo- 
perative characteristics. Cases and their controls were 
identified, and charts were abstracted to define antibio- 
tic use and obtain baseline severity adjustment data. 
Results: For general surgery, the odds of dying within 60 
days were less than half in those treated with preopera- 
tive antibiotics within 2 hours of incision as compared 
with those without such treatment: (odds ratio = 0.44; 
95% confidence interval, 0.32-0.60), P < 0.0001). For 
orthopedic surgery, no significant mortality reduction 
was observed (OR = 0.85; 95% confidence interval, 
0.54-1.32; P < 0.464). 

Interpretation: Preoperative antibiotics are associated 
with a substantially lower 60-day mortality rate in el- 
derly patients undergoing general surgery. In patients 
who appear to be comparable, the risk of death was 
half as large among those who received preoperative 
antibiotics. 


After selecting the case and control groups, investigators 
must ascertain the exposure to the factor of interest in 
both groups. Investigators can collect information on ex- 
posures by asking the participants, using existing records, 
or performing physical measurements or laboratory tests.’ 
When using existing records, it is important to consider 
whether the exposure would have been just as likely to 
be routinely recorded for cases as for controls. 

Because the investigators often look backward in time to 
determine the exposure to an outcome of interest, they de- 
pend on the participants’ memory of the exposure or on 
the availability of accurate data. When a patient is more 
likely to remember an exposure if they have experienced 
the outcome of interest, this can affect the comparison be- 
tween groups by introducing recall bias. To avoid this, in- 
vestigators can attempt to choose a group of controls that 
are just as likely as cases to report exposures. 


Jargon Simplified: Recall Bias 

“Recall bias occurs when patients who experience an ad- 
verse outcome have a different likelihood of recalling an 
exposure than the patients who do not have an adverse 
outcome, independent of the true extent of the expo- 
sure.” 


Similar to cohort studies, it is important to collect informa- 
tion on any potential confounding variables in case-con- 
trol studies. Methods of controlling for confounders in- 
clude matching cases and controls, or using statistical 
maneuvers, or both.” Matching does have its limitations, 
and investigators should weigh the advantages against its 
disadvantages before deciding to match cases and controls. 
If chosen, matching should involve only a few variables, in- 
clude those related to both the exposure and disease, be 
strongly related to the disease, and be feasible in terms of 
cost and recruitment.° 

It is best to select newly diagnosed, or incident cases 
rather than cases that have had the outcome for a longer 
time, called prevalent cases. With prevalent cases, it is 
more difficult to interpret cause and effect, as some expo- 
sures may have occurred after the outcome was present 
rather than preceding it.4° Additionally, people who have 
had the outcome for a long duration are likely to be overre- 
presented among prevalent cases and can bias the associa- 
tion between exposure and outcome.’ 

In case-control studies, the relationship between the 
exposure and the outcome of interest is represented by a 
calculation called an odds ratio (OR). The OR is the ratio 
of the odds of an event occurring among those exposed 
to the odds of an event occurring among those unexposed. 
This is called the disease odds ratio, which is the same as 
the exposure odds ratio. For example, if the odds of an 
event are three times higher in the exposed compared 
with the unexposed, it is also true that the odds of expo- 
sure are three times higher among the cases compared 
with the controls. 


Jargon Simplified: Odds Ratio 

The odds ratio is “a ratio of the odds of an event in an ex- 
posed group to the odds of the same event in a group 
that is not exposed.” 


I Key Concepts: Odds Ratio (Table 29.2) 


Table 29.2 The 2 x 2 Table 





Outcome 

Yes No 

Yes a b 
Exposure are z ae 
No č d cld cb 


Source: From Guyatt GH, Rennie D, eds. Users’ Guides to the 
Medical Literature: A Manual for Evidence-Based Clinical Practice. 
Chicago, IL: AMA Press; 2001. Reprinted by permission. 
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Key Concepts: Key Steps in Conducting a Surgical 
Case-Control Study 
1. Select cases 
e Among people seeking medical care for the out- 
come of interest 
e Preferably those newly diagnosed, using estab- 
lished diagnostic criteria 
2. Select controls 
e Relatively similar to cases but do not have the out- 
come of interest 
e Consider age, sex, and other medical conditions 
3. Ascertain exposure to risk factors and confounding 
variables 


Cross-Sectional Study 


A cross-sectional study measures the prevalence of an out- 
come at the same time as it measures an exposure of inter- 
est. Cross-sectional studies are best used for exposures 
that are fixed characteristics, or to provide initial informa- 
tion about the association of an exposure and an outcome.’ 
They are often used when case-control or cohort studies 
are less feasible, such as for conditions that patients do 
not seek medical care at the outset, or when large sample 
sizes or long follow-up would be required.° These designs 
are also useful for public health and health care planning. 


Jargon Simplified: Cross-Sectional Survey 

A cross-sectional survey is “the observation of a defined 
population at a single point in time or during a specific 
time interval. Exposure and outcome are determined si- 
multaneously.”” 


Cross-sectional studies are often based on a sample of the 
general population, offering more generalizable results, 
and can be conducted over a short time at low costs. Be- 
cause the exposure and outcome are assessed at the 
same time, this design does not lend well to determining 
cause and effect.*° Additionally, because they are based 
on prevalence they tend to include more individuals who 
have an outcome of long duration, as those who recover 
or die from it are less likely to be included, which can dis- 
tort information about the exposure and incidence of the 
outcome. 


Jargon Simplified: Prevalence 

Prevalence refers to the total number of people with a 
disease or condition in a certain population at a certain 
timeframe. This includes both people who are newly di- 
agnosed and those who have had the disease or a condi- 
tion for a long time. Prevalent cases must have been in- 
cident cases at some earlier point. 
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The study population should be carefully chosen. The 
study population may be chosen based on a defined geo- 
graphic area, or a more complicated sampling plan may 
be used.° Exposures and outcome status are measured in 
similar ways as in cohort and case-control studies, by 
using existing records, physical measurements, or labora- 
tory tests; however, questionnaires are used most often.’ 
Introductory texts on observational methods describe 
sampling and measurement methods in detail.’ It is also 
important to use preset diagnostic criteria for outcomes, 
and control for confounding variables in statistical analy- 
sis. 


Examples from the Literature: An Example of a Cross- 
Sectional Study 

Source: Vallano A, Arnau JM, Miralda GP, Perez-Bartoli J. 
Use of venous thromboprophylaxis and adherence to 
guideline recommendations: a cross-sectional study. 
Thromb J 2004;2(1):3.! 

Abstract 

Background: Consensus Conferences and Guidelines for 
deep vein thrombosis prophylaxis have been published 
that recommend the use of prophylactic heparins in 
patients with risk of venous thromboembolism (VTE). 
The aim of this study was the assessment of the prophy- 
laxis of VTE and the adherence to accepted guideline 
recommendations throughout the hospital. 

Methods: A cross-sectional study was performed in a 
teaching hospital after guidelines were implemented. 
Patients’ risk factors of deep vein thrombosis, risk cate- 
gories of patients, and prophylaxis used in different 
wards were recorded. Appropriate adherence to the 
guidelines was analyzed. 

Results: Of 397 patients, prophylaxis was used in 231 
patients (58%), and low-molecular-weight heparins 
(LMWH) were used in 224 of them (97%). Patients 
with prophylaxis had a higher mean number of risk fac- 
tors (SD) than those without prophylaxis [3.1 (1.4) vs 1.9 
(1.4); P< 0.05)]. Prophylaxis was used in 72% and 90% of 
moderate and high-risk patients, respectively. Appropri- 
ate adherence to all guideline recommendations was ob- 
served in 42% of patients. Adherence to guidelines was 
high as regards the use of prophylaxis according to pa- 
tients’ risk factors (78%) and the use of appropriate types 
of prophylaxis (99%), but was low regarding appropriate 
heparin dosage (47%) and preoperative dosage (37%). 
Appropriate prophylaxis use was higher in critical care 
and surgical wards than in medical wards. 

Conclusion: Prophylaxis of VTE is generally used in risk 
patients, but appropriate adherence to guidelines is less 
frequent and variable among different wards. Continu- 
ing medical education, discussion and dissemination of 
guidelines, and regular clinical audit are necessary to 
improve prophylaxis of VTE in clinical practice. 
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Key Concepts: Key Steps in Conducting a Surgical 
Cross-Sectional Study 

1. Select study population 

2. Measure exposure 


Conclusions 


Analytic observational studies include cross-sectional stu- 
dies, case-control studies and cohort studies. In contrast to 
randomized trials, treatment allocation is not under the 
control of the investigator, which may introduce inequality 
of the treatment groups as to prognostic factors. Methodo- 
logical pitfalls of observational studies include recall bias 
for cross-sectional studies, selection bias and recall bias 
for case-control studies, and loss to follow-up for cohort 
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30 
Study Sample Size 


“Size matters!” 


Summary 


Guidelines for the required number of patients needed in 
different types of clinical study designs are provided in 


Introduction 


Calculating the study sample size is a critical component in 
the planning phase of a study. Knowing the required sam- 
ple size for a proposed trial can help to answer two impor- 
tant questions: 

e Is it worth the effort doing the study that I am planning? 
e Is my proposed study feasible? 


Study Types 


Sample size calculations are different for different types of 

study designs. Therefore, before calculating study sample 

sizes, one must first determine the study design that 
matches the research question. 

e Are you trying to set up a therapeutic study, in which you 
are comparing the outcome of two or more different in- 
terventions? 

e Or, are you planning a prognostic study in which you 
want to assess the prognostic value of multiple factors 
(e.g., age, gender, comorbidities, type of intervention, 
etc.) on a specific outcome parameter (e.g., mortality, un- 
ion rate, infection rate, etc.) in one group of patients? 


The study characteristics that require a special type of 
sample size calculation and are beyond the scope of this 
chapter are calculations for paired values, categorical va- 
lues with more than two options, and comparisons of 
more than two groups. 


Key Concepts: Sample Size Calculation Depends on the 
Clinical Study Type 

e Therapeutic (comparative) study 

e Prognostic study 


— Lisa Holt 


this chapter. Case-specific examples are used to demon- 
strate how to perform a sample size calculation. 


Study sample size has profound implications on the re- 
sources needed to perform the study, specifically on the 
amount of required funding, the size of the team, and the 
expected duration of the study. 


Therapeutic Study: Comparing Two Interventions 
Scenario 


You want to compare plating and intramedullary nailing 
for the treatment of proximal tibia fractures. Your out- 
comes of interest are revision surgery rates, nonunion 
rates, infection rates (proportions = categorical dichoto- 
mous outcome), and Short Form-36 (SF-36) and short 
musculoskeletal function assessment questionnaire 
(SMFA) scores (mean values = continuous outcomes). 


Parameters Needed and Their Calculation 


Primary and Secondary Outcome Parameters 

Most studies will look at multiple outcome parameters. So 
why do primary and secondary outcome parameters have 
to be defined? Why not just looking at all parameters and 
then see what turns out to be significant? 

The problem is that if multiple tests are being performed 
the likelihood of one of them being significant just by 
chance alone increases with the number of tests being per- 
formed. Typically, ifone comparison is being performed, an 
a level (P value) of 0.05 is being used as an arbitrary cutoff 
for significance, meaning we are willing to accept a false 
positive rate of 5%. However, if multiple tests are per- 
formed the a significance level has to be lowered to ac 
count for the increased possibility of one of the compari- 
sons being significant by chance alone. One of the most 
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common ways of adjusting the significance level is the so- 
called Bonferroni correction.' Applying the Bonferroni cor- 
rection means dividing the standard significance level a by 
the number of comparison tests performed. Practically 
speaking, that means that if you are performing two com- 
parisons (e.g., comparing infection rates and reoperation 
rates after plating versus nailing of proximal tibia frac 
tures) you can only consider a P value of <0.025 as signifi- 
cant (a = 0.05/2). Analogous if you are if you are perform- 
ing five comparisons your significance cutoff for the P va- 
lue needs to be 0.01. The downside of this is obvious: 
you need bigger sample sizes if your target P value is lower. 
The solution to this is to a priori (before the start of the 
study) define one primary outcome for which an a level 
of 0.05 will be used to calculate the sample size and to in- 
terpret the results. For the remaining secondary outcomes, 
the a-value cutoff for interpretation of the results needs to 
be divided by the number of comparisons. To prevent the 
investigators from selecting the primary outcome after 
the study is performed (e.g., declaring as primary whatever 
showed up with a P < 0.05 in the analysis), studies can be 
registered before their start. 


Key Concepts: Bonferroni correction 

e Performing multiple comparisons raises the likelihood 
of having a P value below 0.05 by chance alone. To ad- 
just for multiple comparisons the significance level 
(P value) needs to be lowered, for example, by using 
the Bonferroni correction: divide 0.05 by the number 
of comparisons you perform and use this number as 
your cutoff for statistical significance instead of 0.05. 
By determining a primary outcome a priori (before the 
start of the study), a P value of 0.05 can be used for the 
primary outcome in the data analysis. Bonferroni cor- 
rection has to be performed for all remaining second- 
ary outcomes to avoid false positive results. 


Continuous versus Categorical Outcomes 

It is important to realize that sample size calculations as 
well as data analyses are different for continuous and cate- 
gorical outcomes. Continuous outcomes are outcomes 
expressed as mean values, for example, a visual analogue 
pain scale or functional outcome scores (e.g., SF-36, 
SMFA, Disabilities of the Arm, Shoulder and Hand [DASH] 
Outcome Measure, etc.). Categorical outcomes are out- 
comes in which the results are divided into different cate- 
gories (e.g., excellent, good, average, fair). Categorical out- 
come results are usually expressed as percentages. A very 
common subtype of categorical outcomes is the so-called 
dichotomous (binary) outcome in which the result of the 
individual patient falls into one of two categories (usually 
yes or no). Classic examples are reoperation rates, non- 
union rates, and infection rates; either an event happens 
or it does not (yes or no). Sample size calculations are dif- 
ferent for continuous and dichotomous outcomes and are 
presented separately in the following paragraphs. 


Jargon Simplified: Continuous Outcome 

A continuous outcome is a numbered value that is as- 
signed to the individual patient, for example, a rating 
on the visual analogue pain scale, a range of motion de- 
fined, or an SF-36 value. The outcome can be any value in 
a certain range. 


Jargon Simplified: Categorical Outcome 

In a categorical outcome, the individual patient is as- 
signed to a certain category, for example, the quality of 
his or her fracture reduction can be graded as unaccep- 
table, fair, good, or excellent. 


Jargon Simplified: Dichotomous Outcome 

A dichotomous outcome is a categorical outcome with 
only two categories, such as the presence or absence of 
infection. Be aware that categorical outcomes are sum- 
marized as proportions (percentages); therefore, they 
are sometimes confused with continuous outcomes. 


Sample Size Calculation for a Continuous Outcome 


Scenario 

We want to compare SF-36 physical functioning subscores 
(SF-36 PF) after plating versus nailing of proximal tibia 
fractures. 


Parameters Needed and Their Calculation 

To calculate the required the sample size when comparing 

whether nailing or plating results in a significantly higher 

SF-36 PF score, you have to define the control group (nail- 

ing) and the intervention group (plating). Furthermore, to 

develop the equation, you need to: 

1. Determine a value for the expected standard deviation 
o of the SF-36 PF score in your control group (nailing). 
Typically, this value can be found in the literature, or 
ideally can be determined by an initial pilot study. 

2. Define a difference A you want to detect - a difference 
by which plating raises or lowers the SF-36 PF score. 
This should be a difference that is considered clinically 
relevant. Typically, when looking at functional outcome 
score % of a standard deviation is considered relevant. 

3. Designate an a value (= P value) and the power (1-8). 
Typically, an a value of 0.05 and a power of 80% (0.8) 
is used. That means that you are willing to accept a 
probability of 5% that a difference that is significant 
actually does not exist (false positive). A power of 80% 
means that you have a 80% probability of detecting a 
difference (have a significant result) when there is 
one; in turn, that means that there actually is a 20% 
probability that there is a difference even if your study 
shows no significant difference. Remember, that if you 
have more than one primary outcome parameter you 
have to divide the a value by the number of primary 
outcome parameters. 
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If you want to calculate the sample size by hand, you need 
to use the so-called z score. The actual study power that 
corresponds to a certain z score can be looked up in the sta- 
tistical literature? or on the Internet (Keyword: “z table”). 


Key Concepts: Sample Size Calculation for Continuous 
Outcome 


My = Nz =2(0°)(Z1-a/2 + Z1-p)” | A? 
where 
n; = sample size of group 1 


n = sample size of group 2 


A = difference of outcome parameter between groups 
(6 points) 
o = sample standard deviations (12) 


Z1-a/2 = 20.975 = 1.96 (for a = 0.05) 
Z1-p = Zogo" 0.84 (for B = 0.2) 


From the equation above, our proposed study will require 
63 patients per treatment arm to have adequate study 
power, nı = ny = 2(127) (1.96 + 0.84)? / 6? = 63. In other 
words, under the assumption that the average standard 
deviation for the SF-36 PF score is 12, you have an 80% 
probability to detect a difference of 6 points in SF-36 PF 
scores between nailing and plating for proximal tibia frac 
tures with 63 patients in each group if you accept a false 
positive rate of 5% if there truly exists a difference. 


Sample Size Calculation for a Categorical Dichotomous 
Outcome 


Scenario 
We want to compare reoperation rates after plating versus 
nailing of proximal tibia fractures. 


Parameters Needed and Their Calculation 

To calculate the required sample size when comparing 

whether nailing or plating results in a significantly higher 

reoperation rate, for the equation you will need to 

1. Specify the proportion p, of the event rate (reoperation 
rate) in your control group (nailing). Again, typically 
this value can be found either in the literature, or ide- 
ally from an initial pilot study. 

2. Define a difference Ayou want to detect - a difference 
by which plating raises or lowers the reoperation 
rate. For the equation you will not only need the differ- 
ence, but also the proportion p, of patients that you 
hypothesize to undergo a reoperation after plating. 

3. Designate an a value (= P value) and the power (1-8). 
Typically, an « value of 0.05 and a power of 80% (0.8) 
is used. 
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If you want to calculate the sample size by hand, you need 
to use the so-called z score. The actual study power that 
corresponds to a certain z score can be looked up in the sta- 
tistical literature” or on the Internet (Keyword: “z table”). 


Key Concepts: Sample Size Calculation for 
Dichotomous Outcome 


My = Ny = [(2Pm4m) Z1-a/2 + (Pid + P2q2)'” 21-6 |? 
where 
n, = sample size of group 1 


n = sample size of group 2 


Pı, P2 = Sample probabilities (5% and 10%) 
qi, q2 = 1 - pi, 1 - p2 (95% and 90%) 
Pm = (P1 + P2)/2 (7.5%) 
Gm = 1- Pn (92.5%) 
A = difference = pz - pı (5%) 
Z1-a/2 = 20.975 = 1.96 (for a= 0.05) 


Z1-8 = Zo.80 7 0.84 (for B = 0.2) 


Thus, we need 433 patients per treatment arm to have 
adequate study power for our proposed trial: 


nı = m = [(2 x 0.075 x 0.925)! x 1.96 + 
(0.05 x 0.95 + 0.1 x 0.9)'? x 0.84]? / 0.05? = 433 


Hence, under the assumption that the reoperation after 
nailing of proximal tibia fracture is 10%, you have an 80% 
probability to detect that if there truly is a difference, plat- 
ing will lower the reoperation by 5% with 433 patients in 
each group if you accept a false positive rate of 5%. 


Accounting for Dropouts 

After calculating the sample size (n), you also have to ac- 
count for potential dropouts of study patients. The esti- 
mate dropout rate will depend on patient group, practice 
setting, length of study, and complexity of follow-up: 


Neorrected = nf(1 = dropout) 


Prognostic Study: Assessing the Prognostic Value 
of Multiple Factors on a Specific Outcome 
Parameter 


Scenario 


You want to compare whether age, gender, number of co- 
morbidities, smoking history, surgical technique, or type of 
postoperative rehabilitees have an influence on the re- 
operation rate. 
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Regression Analysis 


In an observational prognostic study as described in the 
scenario above, the influence of several independent vari- 
ables on one dependent outcome variable can be assessed 
using a so-called regression analysis. If a categorical 
dichotomous outcome variable is used as a dependant 
variable, the type of regression analysis is called logistic 
regression. But how many variables can you use in the 
logistic regression model and which variables should you 
use if there are too many? 

The first step is to perform a univariable analysis to look 
at how a single variable influences the reoperation rate. 
Evaluating a dichotomous variable such as gender, you 
would perform a chi-square test or Fisher’s exact test to 
compare the proportions of reoperations between males 
and females. Evaluating continuous variables you would 


Conclusions 


Determining the number of patients required for a study is 
a very important part of planning any trial. Ultimately, a 
sample size calculation is based on many assumptions, cer- 
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compare the age of the patients that underwent an opera- 
tion compared with the age of the patient that did not need 
a reoperation. After you have done this for all you variables 
of interest, you can rank them in order of significance. A 
univariable analysis is a good first step, but it does not 
tell you which variables are predictors of your outcome 
variable reoperation and which are confounders. To distin- 
guish between predictors and confounders, you can per- 
form a multivariable analysis, which is a regression analy- 
sis. In the regression analysis you can specify multiple po- 
tential predictor variables. For each predictor variable 
(gender, age, etc.) you need 10 events (reoperations) in 
your sample.’ That means with a sample of 200 patients 
and an average reoperation rate of 10% (20 patients), you 
could evaluate two predictor variables. Typically, you 
would choose the two variables that turned out to be 
most significant in your univariable analysis. 


tainly not more than your “best guess.” A sample size range 
should be presented with your “best case” and “worst case” 
scenario. 
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How to Budget for a Research Study 


“Money never made a man happy yet, nor will it. There is nothing in its nature to produce 
happiness. The more a man has, the more he wants. Instead of its filling a vacuum, it 


makes one.” 


Summary 


The purpose of this chapter is to show researchers how to 
prepare accurate budgets and detailed budget justifica- 
tions for different research initiatives. Details on how to 
prepare a budget for a large multicenter trial, how to pre- 


Introduction 


Central methods centers that coordinate multiple trials of- 
ten have operating budgets of over one million dollars and 
more than 30 research personnel.' Preparing an accurate 
budget for a research trial is a vital, but difficult task. The 
budget should be prepared after a detailed protocol has 
been prepared, prior to the submission of an application 


— Benjamin Franklin 


pare a budget for a small single-center initiative, and how 
to review effectively an offer to participate in a clinical trial 
that is initiated by other academic institutions or by indus- 
try are provided. 


for research funding. In this chapter, a template for prepar- 
ing detailed budgets for multicenter trials and single-cen- 
ter initiatives is provided. Items to consider when review- 
ing an offer to participate in a clinical trial that is being co- 
ordinated by a colleague at another academic institution or 
by industry are also discussed. 


Preparing a Budget for a Large, Multicenter Randomized Trial 


Preparing an accurate and detailed budget and budget jus- 
tification for a large, multicenter randomized trial can be a 
difficult task. Different granting and funding agencies have 
different format requirements and budget amounts, so it is 
important to format your budget according to their speci- 
fications. A detailed, accurate budget using the template 
described below should be drafted before reformatting 
the budget to meet the specifications of a specific funding 
agency. Determining an accurate budget will ensure that 
you do not take scarce funding dollars that you do not re- 
quire for the successful completion of your trial, or alterna- 
tively, that you do not end up underfunded for a large clin- 
ical trial, and are subsequently unable to successfully com- 
plete your trial. 


Key Concepts: Costs to Consider 
Central Methods Center Costs 

e Study coordinator 

e Data manager 

e Data analyst 

e Research assistant(s) 

e Administrative assistant(s) 

e Randomization system 

e Data management system 


e Supplies 
e Communication 
e Travel for meetings and monitoring 


Clinical Center Costs 

e Clinical research coordinator 
e Administrative assistant 

e Patient expenses 

e Communication costs 

e Supplies 


As an example, let us take a trial comparing two surgical 
implants for the treatment of femur fractures. Sixty-five 
trauma centers in North America have agreed to partici- 
pate in the trial; the projected sample size is 2000 patients 
(1000 patients per treatment arm). Patients will be 
screened prior to surgery, randomized, and then followed 
at regular intervals for 2 years. The budget should begin 
with a brief introduction, indicating the sample size of 
the study, where the patients will be recruited, and where 
the trial coordination will take place. It is also important to 
be explicit in which currency the budget is presented in, 
what costs are included, and if costs have been inflated 
and by how much. 
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Examples from the Literature: Introduction to a 
Budget 

The FEMUR Study will recruit 2000 patients in 65 hospi- 
tals in North America, with 45 sites in the United States 
and 20 sites in Canada. The central methods center will 
be responsible for the day-to-day monitoring and man- 
agement of the FEMUR Study. The budget is presented in 
Canadian dollars and most costs have been increased at 
an annual rate of (C$) 3 % to account for price inflation. 
This budget justifies the costs for coordination, collec- 
tion, and analysis of data. 


The first section of the budget provides a justification for 
the budget and details on how the payments to clinical 
sites for their enrollment of research participants are to 
be made. Typically, each center will receive a payment 
per patient enrolled in the study and then payments to 
cover the costs associated with each completed patient fol- 
low-up. 

All clinical sites require preparatory work before starting 
to enroll patients into a large, randomized clinical trial. 
This includes preparing local ethics board submissions, 
site-specific consent form, contracts, and educational ac 
tivities to prepare for the trial. Study personnel require 
time to learn the study protocol, organize filing systems 
and study materials, and familiarize themselves with the 
randomization and data management systems. The study 
research personnel will prepare local ethics submissions, 
draft consent forms, and conduct in-services. 


Payments to Clinical Sites 


The next items to consider when preparing the budget and 
budget justification for the costs at the clinical sites is the 
time the clinical research coordinator will spend working 
on the trial. The easiest way to do this is to carefully review 
the study protocol, research coordinators’ manual, and 
case report forms, and then make a detailed list of all the 
tasks for which the clinical research coordinator will be re- 
sponsible. It is also important to provide a time estimate 
for each of the tasks and determine how often they will 
need to be performed. Usually determining all costs asso- 
ciated with the enrollment, randomization, and follow- 
up of one patient is a good method of summarizing this in- 
formation. However, if the amount of time is likely to vary 
substantially from patient to patient, providing an esti- 
mate of 10 or 20 patients is another simple way to com- 
plete the budget. Tasks to consider in a budget can include: 
(1) patient screening, recruitment, and obtaining informed 
consent; (2) randomization; (3) collection of baseline data; 
(4) collection of surgical data; (5) collection of in-hospital 
data; (6) patient follow-up; (7) screening of potential re- 
search participants; (8) submitting data to the central 
methods center; (9) filing; (10) responding to quality con- 
trol reports; and (11) responding to adjudication requests. 


Other expenses to consider are patient-related expenses 
such as parking, travel to the hospital or clinic for addi- 
tional follow-up appointments, and additional costs of di- 
agnostic tests that are not included as the usual standard of 
care. It is also important to budget for any office supplies, 
photocopying, faxing, and telephone calls to the central 
methods center. One should also budget for the extra 
time required for site monitoring, attending meetings, 
and locating difficult to contact patients. 


Examples from the Literature: Schedule of Payments 
to Clinical Sites 

Each center will receive a per-patient payment for each 
patient enrolled and follow-up in the trial. Payments will 
be issued at four time intervals: (1) after hospital dis- 
charge for enrollment into the study, for completion of 
the in-hospital case report forms, and for the hospital 
discharge assessment; (2) at the 6-month follow-up for 
the 2-week, 3-month, and 6-month follow-up visits; 
(3) at the 12-month follow-up for the 9-month and 
12-month follow-up visits; and (4) at the 24-month visit 
for the 18-month and 24-month follow-up visits. 


Table 31.1 outlines our expectations for patient progress 
throughout the entire term of the 4-year study. This table 
has been used to derive the total cost of payments to clin- 
ical sites. Payments to clinical sites include the initial costs 
of site preparation and overall patient care costs consisting 
of payment for clinical research personnel for patient 
screening, enrollment, and follow-up. 


Site Preparation 


All sites require preparatory work before starting. This in- 
cludes preparing local ethics board submissions, site-spe- 
cific consent form, contracts, and educational activities to 
prepare for the trial. Study personnel require time to learn 
the study protocol, organize filing systems and study ma- 
terials, and familiarize themselves with the randomization 
and data management systems. The study research per- 
sonnel will prepare local ethics review board submissions, 
draft consent forms, and conduct in-services. Site investi- 
gators will review the study protocol with their colleagues. 
Site preparation: 35 hours in total. 


Table 31.1 Four-Year Patient-Enrollment Study Schedule 














Year Patients 6-Month 12-Month 24-Month 
Enrolled Follow-up Follow-up Follow-up 

1 1500 500 

2 500 1500 1500 

3 500 1500 

4 500 

Total 2000 2000 2000 2000 
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Patient Screening, Recruitment, and Consent 


It is clear that assigned research personnel must possess 
the knowledge and skills found exclusively in seasoned 
clinical research coordinators. The majority of the clinical 
research coordinators will be nurses or will have a Bache- 
lor or Masters of Science degree. Therefore, only experi- 
enced center research personnel will be designated. In all 
centers, clinical research coordinators will screen the hos- 
pital admission list to identify patients who fulfill the elig- 
ibility criteria. Clinical research coordinators will use a 
variety of other approaches to capture patients admitted, 
including checking with the attending orthopaedic sur- 
geon and residents on call, screening the daily surgical 
list for eligible patients and the surgical list from the pre- 
vious day to ensure no patients were missed, and review- 
ing the patient list on surgical wards and intensive care 
units. Centers will also use all potential patient sources, in- 
cluding asking the anesthesiology and emergency medi- 
cine departments, and the surgical services to page the 
clinical research coordinator regarding all admissions 
through the emergency department. The study coordina- 
tor will carry a pager for inquiries during the day and night, 
7 days a week. The orthopaedic surgeon, residents, or clin- 
ical research coordinator will approach all patients (or pa- 
tient families) who fulfill the eligibility criteria to obtain 
informed consent. 

Patient screening and enrollment: 2 hours per patient. 


Randomization, Collection of Data, and Following 
Patients in the Hospital 


After obtaining written informed consent from eligible pa- 
tients or their family members, the clinical research coor- 
dinator will interview and examine patients and review 
their charts and radiographs to obtain information on pa- 
tient demographics, baseline information, and fracture 
characteristics. They will complete the appropriate case 
report forms. Prior to the surgery, the clinical research co- 
ordinator or delegate will randomize the patient using the 
Internet system or the automated telephone randomiza- 
tion system. Once the patient has been enrolled, the clini- 
cal research coordinator will be responsible for scheduling 
the surgery in the appropriate timeframe. The clinical re- 
search coordinator will also contact the operating room 
manager to let him or her know which treatment the pa- 
tient has been randomized to, to ensure that the correct 
implants are ready in the operating suite. Following the 
surgery, the clinical research coordinator will complete 
the surgical case report form and the perioperative case 
report form. The clinical research coordinator will follow 
patients throughout their time in the hospital and will 
personally evaluate patients and review patients’ medical 
records for complications, adverse events, and reopera- 
tions. The clinical research coordinator will complete the 
postoperative case report forms and conduct quality of 
life interviews with the patients at the time of hospital 
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discharge. The clinical research coordinator will also be 
responsible for obtaining postoperative radiographs. 
Hospital data collection: 6 hours per patient. 


Patient Follow-up and Completion of the Quality of 
Life Interviews 


The clinical research coordinator or research nurse will be 
responsible for scheduling the patients’ follow-up appoint- 
ments. Patients will return to the fracture clinic at 3- 
months, 6-months, 12-months, and 24-months. At each 
follow-up visit, the clinical research coordinator will ad- 
minister several quality of life questionnaires to the pa- 
tients. The clinical research coordinator will also ensure 
that the patient has the appropriate radiographs taken at 
each visit. The clinical research coordinator will call the 
patient if they miss an appointment and will reschedule. 
The clinical research coordinator will also interview the 
patient about any complications, adverse events, and re- 
operations and will verify this information from the pa- 
tients’ medical records and radiographs. All information 
will also be verified with the patients’ attending ortho- 
paedic surgeon. At 2-weeks, 9-months, and 18-months, 
the clinical research coordinator will ascertain patient 
status (i.e., reoperations, adverse events, deaths, quality 
of life questionnaires) by telephone and will verify the 
information in medical records. All information will also 
be verified with the patients’ attending orthopaedic sur- 
geon. 

Clinic follow-up: 9 hours per patient (2.25 hours / ap- 
pointment x 4 appointments). 

Telephone follow-up: 3 hours per patient (1 hour / ap- 
pointment x 3 appointments). 


Daily Record of Screened Patients, Submitting | Filing 
Data to the Central Methods Center, Responding to 
Quality Control Reports and Adjudication Requests 


The clinical research coordinator will keep daily records of 
all patients that were screened for the study. This will in- 
clude patients with femoral neck fractures who did not 
meet the eligibility criteria and those patients who were 
eligible, but were not enrolled in the study and the rea- 
son(s) why. Research personnel will send these records 
to the project office weekly. The clinical research coordina- 
tor will submit (either electronically or by fax) all case re- 
port forms to the central methods center and they will file 
patients’ case report forms at their individual centers. The 
clinical research coordinator will respond to all quality 
control reports. If patients indicate that they have had an 
adverse event, complication, or reoperation, then the clin- 
ical research coordinator will contact the attending ortho- 
paedic surgeon and will review the medical records and 
radiographs to obtain the appropriate documentation for 
adjudication. 

Daily record of screened patients, submitting data to 
the project office, filing data, responding to quality control 
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reports, and responding to adjudication requests: 3 hours 
per patient. 


Patient-related and General Office Expenses 


We will also provide each site with U.S.$ 300.00 per patient 
to cover additional costs associated with enrolling patients 
into this trial. These costs include patient-related expenses 
(parking and reimbursement of travel), office supplies, 
photocopying, faxing, and telephone calls to the central 
methods center. In addition, this will include the extra 
time required for monitoring visits, attending meetings, 
and locating hard to find patients. 

Patient-related expenses, office-related supplies and 
expenses: U.S.$ 300.00 per patient. 


Examples from the Literature: Summary of Payments 
to Clinical Sites 

We anticipate that we will require 23 hours of clinical re- 
search coordinator time for each patient enrolled in FE- 
MUR study. The salary rate for a clinical research coordi- 
nator in ongoing studies is C$40.13/hour (wages) plus 
C$12.04/hour (fringe benefits) totaling C$52.17/hour. 
Therefore, each patient will cost C$1200.00 of clinical re- 
search coordinator time. Adding in patient-related 
expenses, the total cost per patient enrolled will be 
C$1500.00. An extensive breakdown of costs at each 


payment period throughout the term of the study can be 
seen in Table 31.2. We have added 3% to cover the rate of 
inflation in the expenses by year. 

We will also require 35 hours of clinical research coordi- 
nator for site preparation, which equates to approxi- 
mately C$1,826 per site. We anticipate having 65 cen- 
ters, at a cost of C$118,690.00. Utilizing our estimated 
patient progress during the study, as well as the above 
payment schedule, we can obtain a total cost for pay- 
ments to clinical sites outlined in Table 31.3. 


Central Methods Center Costs 


The operating costs of a large central methods center can 
contribute substantially to a trial’s budget. The template 
we have developed divides the costs at the central meth- 
ods center into three categories: (1) staffing costs; (2) 
study administration costs; and (3) travel and meeting 
costs. Salary support is a large part of the budget of a clin- 
ical trial and the regulations for salary support varies be- 
tween funding agencies. For example, the majority of Ca- 
nadian funding institutions do not allow financial support 
for salaries for physicians and professors, whereas many 
American agencies, such as the National Institutes of 
Health (NIH), do. Whenever possible, include the names 
and qualifications of the individuals who will be involved 


Table 31.2 Four-Year Breakdown of Costs to Clinical Sites 























Year Rate/HourC$ PRE C$ Enrollment 6 Months 12 Months 24 Months Total PMT 
(8 hour + PRE) (5.5 hour) (3.25 hour) (6.25 hour) (23 hour + PRE) 
cs cs cs cs cs 
1 52.17 300.00 717.39 286.96 169.57 326.09 1,500.00 
2 53.74 309.00 738.91 295.57 174.65 335.87 1,545.00 
3 55.35 318.27 761.08 304.43 179.89 345.95 1,591.35 
4 57.01 327.82 783.91 313.57 185.29 356.32 1,639.09 
3 % has been added each year to cover the cost of inflation. 
Abbreviations: PMT, payment; PRE, patient-related expenses. 
Table 31.3 Payments to Clinical Sites by Year 
Year Site Patient 6-Month 12-Month Patient Total PMT 
Preparation Enrollment PMT Follow-up PMT Follow-up PMT Completion PMT for the Year 
cs cs cs cs cs cs 
1 118,690.00 1,076,086.96 143,478.26 - - 1,338,255.22 
2 - 369,456.52 443,347.83 261,978.26 - 1,074,782.61 
3 - - - 89,945.87 518,918.48 608,864.35 
4 - - - - 178,162.01 178,162.01 
Total 118,690.00 1,445,543.48 586,826.09 351,924.13 697,080.49 3,200,064.18 





Total payments to clinical sites: C$3,200,064.18 


3 % has been added each year to cover the cost of inflation. 


Abbreviations: PMT, payment. 
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in the trial, as it indicates that you already have qualified 
personnel to coordinate your trial. We calculated salaries 
based upon an annual inflation rate of 3% for merit and 
cost of living increases. Study administration costs include 
computers, the data management system, the randomiza- 
tion system, communication with clinical centers, and 
printing. Each of these items is detailed below. Finally, tra- 
vel and committee meetings are an important means of 
communication in a large multicenter randomized trial. 
These expenses and their justifications are also detailed 
below. 


Staffing Costs 


e Nominated principal investigator: The nominated princi- 
pal investigator will spend 10.5 hours per week during 
the study planning and data analysis periods and 7 hours 
per week during the recruitment and follow-up periods. 
The nominated principal investigator is responsible for 
the overall supervision of the study. The nominated 
principal investigator brings substantial experience in 
the conduct of large orthopaedic trauma studies. Salary 
costs are not eligible and will be covered by our univer- 
sity. 

Co-principal investigator: The co-principal investigator 
will spend 2 hours per week on the project. The co-prin- 
cipal investigator has played a key role in over 25 major 
clinical studies (including both observational and rando- 
mized clinical trials) and has expertise in the conduct of 
cohort studies. The co-principal investigator will provide 
advice on key design, methodological, and practical as- 
pects of study conduct and will also serve on the steering 
committee. Salary costs are not eligible and will be cov- 
ered by our university. 

Co-investigator: There will be several co-investigators 
with biostatistical, methodological expertise, and clini- 
cal trial experience who will serve on the steering com- 
mittee and as advisors. Salary costs are not eligible and 
will be covered by our university. 

Study coordinator: The study coordinator will spend 17.5 
hours per week (910 hours per year) during the study. 
The study coordinator has extensive experience in the 
area of clinical research and large orthopaedic trauma 
trials. The study coordinator will be responsible for the 
daily conduct of the study, including data management, 
preparing monthly reports on patient screening, patient 
follow-up, data transmission, consistency and thorough- 
ness of data collection, and event rates; transmitting 
these reports to centers; developing and transmitting 
to all site investigators and clinical research coordinators 
weekly enrollment reports; communicating with site in- 
vestigators and clinical research coordinators regarding 
protocol and other procedural questions; coordinating 
the shipping of study aids; writing study newsletters; 
maintaining required documentation by regulatory 
agencies; training of the research assistants; reviewing 
all events prior to adjudication; compiling all the records 
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Table 31.4 Salary Budget for the Study Coordinator 

















Year Rate/Hour* Hours Total 

cs Spent/Year cs 
1 41.76 910 38,001.60 
2 43.01 910 39,141.65 
3 44.30 910 40,315.90 
4 45.63 910 41,525.37 
Total 3640 158,984.52 


* Annual inflation rate of 3% factored in for merit and cost of 
living increases. 


required for the adjudication process; overseeing the 
adjudication process; preparing presentations and 
packages for the study committees; organizing investi- 
gator meetings, steering committee meetings, adjudica- 
tion committee meetings, and weekly project office 
meetings with the nominated principal investigator. 
The current salary rate for a study coordinator in ongoing 
studies is C$32.12 per hour (wages) plus C$9.64 per hour 
(fringe benefits) totaling C$41.76 per hour. As stated, the 
study coordinator will spend 910 hours on this project 
each year for the duration of the trial (Table 31.4). 
Senior statistician: The senior statistician will spend 1 
hour per week on the FEMUR study and will provide gui- 
dance and expertise for the study statistician. The senior 
statistician has extensive experience in the analysis of 
large randomized trial data. Salary costs are not eligible 
and will be covered by our university. 

Study statistician: The study statistician will spend 3.5 
hours per week (182 hours per year) during the planning 
(first 3 months), recruitment (12 months), and follow-up 
periods (24 months), and 35 hours per week during the 
data analysis period (remaining 9 months). Therefore, 
the study statistician will spend 1410.5 hours (3.5 
hours/week x 13 weeks + 35 hours/week x 39 weeks) 
during year 4 of the study. The study statistician has a 
Masters degree in statistics and prior experience with 
orthopaedic trauma randomized controlled trials. She 
or he is responsible for conducting weekly data checks 
to monitor the data quality, performing ongoing analysis 
of data, preparing weekly study reports, and conducting 
statistical analysis for all study meetings, scientific meet- 
ings, and publications. The current salary rate for study 
statisticians in ongoing studies is C$33.28 per hour 
(wages) plus C$9.98 per hour (fringe benefits) totaling 
C$43.26 per hour (Table 31.5). 

Data manager: P The data manager will spend 35 hours 
per week during the planning phase (first 3 months) and 
3.5 hours per week (182 hours per year) during recruit- 
ment, follow-up, and analysis periods (remaining 45 
months). Therefore, the data manager will spend 591.5 
hours (35 hours/week x 13 weeks + 3.5 hours/week x 
39 weeks) during the first year of the study. The data 
manager will create the initial database, including inter- 
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Table 31.5 Salary Budget for the Study Statistician 


Table 31.6 Salary Budget for the Data Manager 
































Year Rate/Hour* Hours Total Year Rate/Hour* Hours Total 

cs Spent/Year cs cs cs Spent/Year cs 
1 43.26 182 7,873.32 1 39.58 591.5 23,411.57 
2 44.56 182 8,109.52 2 40.77 182 7,419.67 
3 45.89 182 8,352.81 3 41.99 182 7,642.26 
4 47.27 1410.5 66,676.27 4 43.25 182 7,871.52 
Total 1956.5 91,011.91 Total 1137.5 46,345.02 


* Annual inflation rate of 3% factored in for merit and cost of living 
increases. 


nal validity and range checks and will format the case re- 
port forms so that they will be suitable for him or her to 
begin the more time-consuming task of actually con- 
structing the database. This database will allow data to 
be submitted by fax or directly through a secure Internet 
Web site. The data manager will work intensively with 
the nominated principal investigator and study coordi- 
nator to ensure logical flow of data collection. She will 
also work closely with the study statistician during the 
analysis phase. The current salary rate for data managers 
in ongoing studies is C$30.45 per hour (wages) plus 
C$9.13 per hour (fringe benefits) totaling C$39.58 per 
hour (Table 31.6). 

e Research assistants: We will require two research assis- 
tants with Bachelor of Science degrees and several years 
of experience as a research assistant. Combined, the two 
research assistants will spend a total of 70 hours per 
week (3640 hours per year) for the duration of the trial 
and will be responsible for ensuring that data quality is 
maintained through the multilevel validation of the 
case report forms on the study. It is expected that on 
average patients will have 200 forms. Overall, there 
will be 400,000 case report forms to be validated in the 
4-year period. The research assistants will also assist 
with the production and printing of study material, ar- 
ranging meetings and travel, coordinating reimburse- 
ments to participating sites, and preparing packages 
for adjudication and meetings. The current salary rate 


* Annual inflation rate of 3% factored in for merit and cost of living 
increases. 


Table 31.7 Salary Budget for a Research Assistant 














Year Rate/Hour* Hours Total 

cs Spent/Year cs 
1 28.98 3640 105,487.20 
2 29.85 3640 108,651.82 
3 30.74 3640 111,911.37 
4 31.67 3640 115,268.71 
Total 14560 441,319.10 





* Annual inflation rate of 3% factored in for merit and cost of living 
increases. 


for a research assistant in ongoing studies is C$22.29 
per hour (wages) plus C$6.69 per hour (fringe benefits) 
totaling C$28.98 per hour (Table 31.7). 


Asummary of all staffing expenses for the central methods 
center is presented in Table 31.8. 


Study Administration Costs 


© Computer hardware: We will only need to purchase three 
new computers for the study, as some of the other study 
personnel will be able to work on existing hardware. 
Each personal computer will cost C$1700.00. In addition, 
a networked printer is required for the study, costing 


Table 31.8 Summary of Central Methods Center Staffing Costs by Year 














Role Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs 
Study coordinator 38,001.60 39,141.65 40,315.90 41,525.37 158,984.52 
Study statistician 7,873.32 8,109.52 8,352.81 66,676.27 91,011.91 
Data manager 23,411.57 7,419.67 7,642.26 7,871.52 46,345.02 
Research assistants 105,487.20 108,651.82 111,911.37 115,268.71 441,319.10 
Total 174,773.69 163,322.65 168,222.33 231,341.88 737,660.55 





Total central methods center staffing costs: C$737,660.55 


3% has been added each year to cover the cost of inflation. 
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Table 31.9 Computer Hardware Costs 











Hardware Price Quantity Total 

cs cs 
Personal computer 1700.00 3 5100.00 
Network printer 1000.00 1 1000.00 
Total 6100.00 


C$1000.00. The total cost is C$6100.00 and these items 
will be purchased during the planning phase (Year 1) 
(Table 31.9). 

e Clinical database system: Data management software will 
be used for receipt and processing of all data. The data- 
base management software, DataFax, consists of the fol- 
lowing components; 3% has been added each year to each 
of the component’s costs to cover the cost of inflation. 
— Licenses: We will need four computers with the data- 

base license (two for research assistants, one for the 
study coordinator, and one for the nominated princi- 
ple investigator) for the duration of 4 years. The fee 
schedule (in U.S. dollars) for licensing is given in 
Table 31.10. 

Currently, the central methods center holds licensing 
for 13 seats; therefore, it will cost U.S.$1000.00 per 
computer per year for the DataFax licenses totaling 
U.S.$4000.00 per year. Although exchange rates are 
not static, we will use the current rate of 1.15 C$ per 
U.S.$. Therefore, licensing will cost approximately 
U.S.$4600.00 (U.S.$4000.00 x 1.15) per year. 

— Maintenance, support, and upgrades: The study will 
use some of the existing hardware at the central 
methods center in addition to the new hardware pur- 
chased. During the course of the study, some of the 
new hardware, as well as the existing hardware, will 
need upgrades, maintenance, and/or replacement. 
In addition, computer supplies, such as backup tapes 
and printer toners are required throughout the study. 
Based on our experience in previous trials, we have 
estimated these costs to be C$2500.00 per year. 
Further costs include annual hardware maintenance 
and computer support, which are outlined in 
Table 31.11. 


Table 31.10 Schedule of Database Licensing Fees 














Description Breakdown Cost 

U.S.S U.S.$ U.S.S 

For the first 5 seats 16,000 + 13,000 50,000.00 
+ 10,000 + 7000 + 4000 

For the next 25 seats = 1000/seat 1000.00 

After 30 seats 500/seat 500.00 

DataFax server Each additional multiuser 10,000.00 

DataFax server Each additional single user 5000.00 
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Table 31.11 Annual Computer Hardware Maintenance 
and Support Costs 








Description Cost/year 
cs 
Hardware maintenance 579.97 
Computer support 13,520.00 
Total cost per year 14,099.97 


FEMUR is one of four projects subject to the above 
costs; therefore, the study will cover one-fourth of 
the total annual costs. These costs will continue 
throughout the 4-year duration of the study at an 
annual cost of C$3525.00 (C$14,099.97 x 0.25) 
per year. In total, it will cost C$6025.00 (C$3525.00 
+ ($2500.00) per year for maintenance, support, 
and upgrade costs. 

Faxes received: The FEMUR Study will receive a total of 
400,000 (200 per patient x 2000 patients) case report 
forms to be completed and faxed to the central meth- 
ods center from the time patients are enrolled in the 
study to their last follow-up visit. We have calculated 
the cost of faxes received on a yearly basis, taking into 
account the varying number of forms for each follow- 
up visit. It will cost C$0.07 per fax received; the 
volume per year has been calculated and the results 
are given in Table 31.12. 

Faxes sent: Quality control reports on individual case 
report forms with errors or omissions will be sent to 
centers weekly by fax. These reports identify out- 
standing quality control issues and will also include 
reminders for follow-up appointments for patients. 
Each site will be sent a quality control report (~3 
pages) each week from the time they begin randomi- 
zation (4th month) to the end of the study. Therefore, 
we will send 7,605 (65 sites x 3 pages x 39 weeks) 
faxes in the first year of the study and 10,140 (65 sites 
x 3 pages x 52 weeks) faxes per year thereafter 
(Table 31.13). 


Table 31.12 Costs Associated with Faxes Received 














Year Cost] Fax* Pages Received Total 

cs cs 
1 0.07 150,831 10,558.17 
2 0.07 170,670 12,305.31 
3 0.07 70,499 5235.47 
4 0.08 8000 611.93 
Total 400,000 28,710.87 


* 3% has been added each year to cover the cost of inflation. 
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Table 31.13 Costs Associated with Faxes Sent 

















Year Cost] Fax* Pages Faxed Total 

c$ cs 
1 0.50 7,605 3,802.50 
2 0.52 10,140 5,222.10 
3 0.53 10,140 5,378.76 
4 0.55 10,140 5,540.13 
Total 38,025 19,943.49 


* 3% has been added each year to cover the cost of inflation. 


A summary of data management component costs by year 

is given in Table 31.14. 

e Communication: This includes all conference calls and 
communication with clinical sites costs; 3% has been 
added each year to cover the cost of inflation. 


Conference calls: Conference calls will be made to each 
clinical center three times per year throughout 
the duration of the 4-year study. At an average cost 
of C$30.00 per call, we expect a yearly cost of 
C$5850.00 (C$30.00 per call x 65 sites x 3 calls) in 
conference calls. 

Communication with clinical sites: We expect to make 
two long-distance telephone calls per site each month 
for the duration of the trial at an average cost of 
C$3.00 per call. Per annum, these calls amount to 
C$4680.00 (C$3.00 per call x 24 calls per year x 65 
sites). Furthermore, due to the geographic scope of 
the FEMUR Study, the study coordinator will need to 
be available after hours. Therefore, the study coordi- 
nator will have a pager and cell phone for the duration 
of the study in case a site needs to contact them after 


hours or in an emergency. The cost of the pager has 
been estimated to be C$300.00 per year (C$25.00 
per month) and the cost of the cell phone has been 
estimated to be C$600.00 per year (C$35.00 per 
month + monthly user charges of C$15.00 per month) 
(Table 31.15). 

Randomization: We are using a central telephone and In- 

ternet randomization schedule to ensure concealed allo- 

cation. The costs are based on our prior large multicenter 
studies. 

— Programming: A computer technician is required to 
program and set-up the telephone and Internet ran- 
domization system. In our previous experience and 
due to the complexity of building such a system, 
this has taken up to 150 hours of the technician’s 
time. The current rate for the computer technician’s 
services is C$100.00 per hour. Therefore, the initial 
programming for the randomization system will 
cost C$15,000.00. 

— Maintenance: Ongoing maintenance fees are also re- 
quired for the telephone and Internet randomization 
system. To offer the Internet randomization service, 
a server must be housed in one of the rooms at 
McMaster University. The price of housing this server 
is C$125.00 per month. To offer the telephone rando- 
mization service, a telephone line must be estab- 
lished, as well as a toll-free number for each country 
involved in the study. The price of a telephone line 
is C$50.00 per month and the price of one toll-free 
number is C$8.00 per month. Therefore, the total 
cost of telephone maintenance per month is 
C$154.00 (C$50.00 per month + C$8.00 per toll free 
line x 13 toll free lines). Therefore, the total cost of 


Table 31.14 Summary of Data Management Component Costs by Year 














DataFax Component Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs 
License 4600.00 4738.00 4880.14 5026.54 19,244.68 
Maintenance 6025.00 6205.75 6391.92 6583.68 25,206.35 
Faxes received 10,558.17 12,305.31 5235.47 611.93 28,710.87 
Faxes sent 3802.50 5222.10 5378.76 5540.13 19,943.49 
Total 24,985.67 28,471.16 21,886.29 17,762.28 93,105.40 


Table 31.15 Summary of Communication Component Costs by Year 








Communication Component Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs 
Conference calls 5850.00 6025.50 6206.27 6392.45 24,474.22 
Communication with clinical sites 5580.00 5747.40 5919.82 6097.42 23,344.64 
Total 11,430.00 11,772.90 12,126.09 12,489.87 47,818.86 


3 % has been added each year to cover the cost of inflation. 


www.urdukutabkhanapk.blogspot.com 


31 How to Budget for a Research Study 


maintaining the randomization system per month is 
C$279.00 (C$154.00 + C$125.00). 

Telephone costs: This includes the cost per call to the 
randomization system. We estimate that half of the 
centers will use telephone randomization and the 
other half will use Internet randomization. Therefore, 
we expect to randomize 1000 patients by telephone. 
As previously stated, 75% of these patients will be en- 
rolled in Year 1 of the study and the remaining 25% in 
Year 2. The Internet randomization system does not 
incur telephone costs; however, based on our experi- 
ence, costs for the telephone randomization system 
average C$2.00 per call. Therefore, we expect to incur 
a cost of C$1500.00 (750 patients randomized by 
telephone x C$2.00 per call) in Year 1 and a cost of 
C$515.00 (250 patients randomized by telephone x 
C$2.06 per call) in Year 2. 


Asummary of the randomization component costs by year 
is given in Table 31.16. 


e Computer software: These costs include the licensing 
costs for software. We will use the following software 
for this project: 


SAS: Statistical analysis software (SAS Institute, Cary, 
NC). Licensing costs C$120.00 per personal computer 
per year. One license is needed for the study statisti- 
cian. 

SPSS: Statistical analysis software (SPSS Inc., Chicago, 
IL). Licensing costs C$300.00 per personal computer 
per year. 


— Adobe Framemaker 7.2 and Media Package (Adobe 
Systems Inc., San Jose, CA): Design of the case report 
forms for data. Licensing costs C$176.00 per personal 
computer plus a media conversion package costing 
C$35.00 per personal computer, totaling C$211.00 
per personal computer (one time only). 

— Microsoft Office 2003 licenses (Microsoft, Inc., Red- 
mond, WA): To be installed on all relevant personal 
computers. Licensing costs C$83.00 per personal 
computer (one time only). 

— Adobe Photo Shop CS 2 for Windows and Media 
Package (Adobe Systems Inc., San Jose, CA): Crop and 
blind radiographs for adjudication. Licensing costs 
C$155.00 per personal computer plus a media conver- 
sion package costing C$35.00 per personal computer, 
totaling C$190.00 per personal computer (one time 
only). 


As per the above list, some of this software will require ad- 
ditional user licenses and for the others, we have assigned 
prorated costs for this project based on total costs for soft- 
ware licenses and the usage for this project as seen in 
Table 31.17. 


e Printing and center supplies: The printing and center sup- 
plies will be created at the central methods center and 
sent to each of the 65 clinical sites. We anticipate that 
the total cost for printing and center supplies will be 
C$94,300.19 as outlined in Table 31.18 and in the item 
descriptions below. 


Table 31.16 Summary of Randomization Component Costs by Year 











Randomization Component Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs 
Programming 15,000.00 - - - 15,000.00 
Maintenance 3348.00 3448.44 - - 6796.44 
Telephone costs 1500.00 515.00 - - 2015.00 
Total 19,848.00 3963.44 - - 23,811.44 


Table 31.17 Prorated Costs Based on Total Costs for Software Licenses and Usage 

















Product License Cost/PC # of PCs Year 1 Year 2 Year 3 Year 4 Total 
cs cs cs cs cs cs 

SAS 120.00 1 120.00 123.60 127.31 131.13 502.04 
SPSS 300.00 3 900.00 927.00 954.81 983.45 3765.26 
Framemaker 211.00 4 844.00 - - - 844.00 
MS Office 83.00 3 249.00 - - - 249.00 
Photo Shop 155.00 3 465.00 - - - 465.00 
Total 869.00 3 2578.00 1050.60 1082.12 1114.58 5825.30 


3 % has been added each year to cover the cost of inflation. 


Abbreviations: PC, personal computer. 
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Table 31.18 Total Costs for Printing and Center Supplies 



































Item Cost/Unit # of Units Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs cs 
Manual of operations 12.46 131 1632.26 - - - 1632.26 
Protocol 7.00 131 917.00 - - - 917.00 
Pocket protocol 1,55 1300 2015.00 - - - 2015.00 
Study posters 3.62 650 2353.00 - - - 2353.00 
Patient binders 33.17 2000 49,755.00 17,082.55 - - 66,837.55 
Adjudication binders 100.91 128 9687.36 3325.99 - - 13,013.35 
Patient ID cards 0.10 2000 150.00 51.50 - - 201.50 
Incentives 4.00 120 280.00 494.40 509.23 524.51 1808.14 
Photocopying 60.00 12 720.00 741.60 763.85 786.76 3012.21 
General supplies 50.00 12 600.00 618.00 636.54 655.64 2510.18 
Total 68,109.62 22,314.04 1909.62 1966.91 94,300.19 


3 % has been added each year to cover the cost of inflation. 


— Manual of operations: The manual of operations is a 
guide for the site investigator and clinical research co- 
ordinator at each site, on how to carry out the trial. It 
consists of a 1.5 inch binder (C$7.46 each) and 50 
pages of information (C$0.10 per page). The total 
cost per unit is C$12.46. There will be two units given 
to each center and one unit given to the central meth- 
ods center. 

— Protocol: The full protocol includes all of the specifica- 
tions of the study and is written by the nominated 
principal investigator during the study development 
phase. The protocol consists of 70 pages of informa- 
tion (C$0.10 per page). The total cost per unit is 
C$7.00. There will be two units given to each center 
and one unit given to the central methods center. 

— Pocket protocol: The pocket protocol is used by the 
clinical research coordinators, orthopaedic residents, 
and orthopaedic surgeons as a reminder of who to in- 
clude in the study. The design of the protocol card 
takes 3 hours by a graphic designer who charges 
C$20.00 per hour for their services. The printing costs 
C$1.50 per card. Factoring in design charges (C$60.00 
design charges divided by 1300 pocket protocols), the 
total unit cost is C$1.55. There will be 20 pocket pro- 
tocols given to each clinical center. 

— Study posters: The study posters are used to inform 
the residents and surgeons of the study and serve to 
remind them of who to include/exclude. The design 
of the poster takes 4 hours by a graphic designer 
who charges C$20.00 per hour for their services. Fac- 
toring in design charges (C$80.00 design charges di- 
vided by 650 study posters), the printing costs are 
C$3.50 per poster. The total unit cost is C$3.62. There 
will be 10 study posters given to each center. 

— Study patient binders: The patient binder is used by 
each site to store the case report forms (CRFs) for 


each patient enrolled in the study. The binder consists 
of one set of 15 dividers (5.71 per package of 15), a 1.5 
inch binder (C$7.46 per binder), and 200 pages of 
CRFs (C$0.10 per page). The total cost per unit is 
C$33.17. There will be 2000 binders made (one for 
each patient). 


— Adjudication binders: The adjudication binder is used 


by the central outcomes adjudication committee at 
their meetings, which occur four times per year (16 
meetings total). The binder consists of a 3 inch binder 
(C$11.92 per binder), a set of 24 dividers (C$8.99 
each), and 800 pages of information (C$0.10 per 
page). The unit cost per binder is C$100.91. There 
will be six members of the central outcomes adjudica- 
tion committee plus the study coordinator and re- 
search assistant, each receiving one binder per meet- 
ing; therefore, the total number of units required is 
128 (eight binders x 16 meetings). 

Patient identification sheets: These sheets are put in 
the patient’s chart identifying them as a study parti- 
cipant. There will be 2000 patient ID sheets (one for 
each patient) at a cost of C$0.10 per page. 

Incentives: Each month, starting after the first month 
of enrollment (5th month), 10 prizes will be given to 
various sites for superior performance in enrollment, 
recruitment, and clean data. Therefore, 70 prizes will 
be given out in Year 1 and 120 per year thereafter. 
Prizes include study pens, pads of paper, etc. Prizes 
cost approximately C$4.00 each. 

Photocopying: General office photocopying for the 
study at the central methods center is required for 
memos, letters, fax cover pages, invitations to meet- 
ings, conference call information, etc. We average 
C$60.00 per month for 48 months; 3% inflation has 
been added each year. 
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— Office supplies: General office supplies for the study 
will be purchased, including printer and fax machine 
toners, pens, pencils, paper, etc. We average C$50.00 
per month for 48 months; 3% inflation has been 
added each year. 

e Postage and courier: Each clinical center will receive four 
shipments of study-related materials. These are sent by 
FedEx courier to ensure delivery. Each center will also re- 
ceive an average of 12 mailings per year (regular mail) 
for other study items. 

— Study-related material: We will ship patient binders, 
protocols, posters, etc. (60 lb box sent via courier) to 
each clinical center each year at an average cost of 
C$176.81 per box. 

— Mailings to the clinical centers: We anticipate two 
mailings per month to each clinical site for the dura- 
tion of the study. Examples of mailings include 
monthly newsletters, correspondence regarding re- 
cruitment, invitations to meetings, etc. (large envel- 
ope sent via mail to each center 24 times a year at 
an average cost of C$5.00 per envelope). 

— Center payments: We will courier center payments 
to each center four times annually for the duration 
of the trial (C$43.34 per envelope x 65 centers x 4 pay- 
ments per year). 

— Mailings to the outcomes adjudication committee 
members: We will mail adjudication packages to 
each committee member before each meeting. There 
will be 16 meetings during the 4-year study and there 
are six committee members (60 Ib box sent via cour- 
ier to each committee member four times a year at an 
average cost of C$176.81 per box). 

— Mailings to the steering committee members: We will 
mail meeting packages to each steering committee 
member prior to each meeting. There will be 12 meet- 
ings during the 4-year study and there are eight com- 
mittee members (large envelope sent via courier to 
each committee member three times a year at an 
average cost of C$43.34). 

— Mailings to the data monitoring committee members: 
We will mail meeting packages to each of the five 


Table 31.19 Summary of Mailing Costs by Year 
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committee members prior to each meeting. There 
will be eight meetings during the 4-year study (large 
envelope sent via courier to each committee member 
two times a year at an average cost of C$43.34). 


Asummary of mailing costs by year is given in Table 31.19. 


e Study web site: The study Web site is used to store in- 
formation, such as cases (blinded DataFax case report 
forms, medical notes, and radiographs) for the central 
adjudication committee to review, trial protocols, man- 
uals, meeting agendas and minutes, study updates, and 
newsletters. The cost to update the site is approximately 
C$50.00 per hour. We expect 14 hours per adjudication 
meeting (16 meetings in total) and 7 hours per month 
(48 months) of updates. Therefore, the maintenance 
and set up of the Web site will cost a total of 
C$28,000.00 (C$50.00 per hour x 560 hours). We have 
averaged this out to C$7000.00 per year at an un-inflated 
rate (Table 31.20). 


Meetings, Travel, and Committee Costs 


The following meeting, travel, and committee costs have 

been inflated at 3% per year. 

e Ethics approval: Some review boards require a fee for the 
review of proposals of industry-sponsored trials. 

e Investigators meetings: The first investigators meeting 
will occur prior to starting the study and all site investi- 
gators and clinical research coordinators will attend the 
meeting to review the study protocol and to discuss en- 
rollment and adherence strategies. We will hold subse- 
quent meetings to discuss trial progress, protocol adher- 
ence, adherence strategies, and maintain study enthu- 
siasm. All of the site investigators and clinical research 
coordinators will attend each meeting. Each meeting 
will take one full day. Our experience with other studies 
have shown that these meetings are invaluable and have 
assisted greatly in ensuring recruitment targets are met, 
as well as maintaining data quality. We will hold a meet- 
ing during each year of recruitment. Furthermore, one 




















Items to Mail Cost/Unit #ofUnits Year1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs cs 
Study-related material 176.81 65 11,492.65 11,837.43 12,192.55. 12,558.33 48,080.96 
Mailings to clinical centers 5.00 1560 7800.00 8034.00 8275.02 8523.27 32,632.29 
Center payments 43.34 260 11,268.40 11,606.45 11,954.65 12,313.28 47,142.78 
Adjudication committee 176.81 24 4243.44 4370.74 4501.87 4636.92 17,752.97 
Steering committee 43.34 24 1040.16 1071.36 1103.51 1136.61 4351.64 
Data monitoring committee 43.34 10 433.40 446.40 459.79 473.59 1813.18 
Total 36,278.05 37,366.39 38,487.38 39,642.00 151,773.83 


3 % has been added each year to cover the cost of inflation. 
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Table 31.20 Summary of Central Methods Center Administration Costs by Year 





























Description Year 1 Year 2 Year 3 Year 4 Total 

cs cs cs cs cs 
Computer hardware 6100.00 - - - 6100.00 
DataFax system 24,985.67 28,471.16 21,886.29 17,762.28 93,105.40 
Communications 11,430.00 11,772.90 12,126.09 12,489.87 47,818.86 
Randomization system 19,848.00 3963.44 - - 23,811.44 
Computer software 2578.00 1050.60 1082.12 1114.58 5825.30 
Printing & supplies 68,109.62 22,314.04 1909.62 1966.91 94,300.19 
Mailing 36,278.05 37,366.39 38,487.38 39,642.00 151,773.83 
Web site 7000.00 7000.00 7000.00 7000.00 28,000.00 
Total: 176,329.34 111,938.53 82,491.50 79,975.64 450,735.01 





Central methods center office administration costs: 


3 % has been added each year to cover the cost of inflation. 


meeting will occur at the end of the short-term study 
and a second at the end of the long-term study to disse- 
minate results to the centers and to provide guidance to 
the centers for provision of study information. 

We will hold the meeting in concert with international 
or national orthopaedic trauma meetings to ensure costs 
are kept to a modest level. We anticipate that at least 75 
people will attend the meeting and that we will only 
have to cover the cost of 40% of the participants (includ- 
ing central methods center staff members). We will 
cover the costs of the airline tickets, ground transporta- 
tion, meals, and one night in a hotel for site investigators 
and clinical research coordinators. The study coordinator 
and research assistants will arrange all meetings, which 
has saved ~30% of overall costs in previous studies. Parti- 
cipants book airline tickets through a specified travel 
agent at least 3 weeks prior to the meeting to ensure 
that the lowest airfares are obtained. The nominated 
principal investigator, study statistician, and study coor- 
dinator are required to attend all meetings. In addition, 
our research assistants will attend to assist with registra- 
tion and the organizational aspects of the meeting. We 
estimate that it will cost C$1500.00 per person (for 30 
people) to attend for a total cost of C$45,000 (30 people 
x C$1,500 per person) per meeting per year. 

Data monitoring committee (DMC) meetings and confer- 
ence calls: The data monitoring committee members 
will participate in conference calls to discuss the trial’s 
progress and to review the adverse events data. There 
will be five committee members. We anticipate that 
there will be one face-to-face meeting per year at a 
cost of C$10,000 (five people x C$2,000 per person) per 
meeting per year and one conference call per year at a 
cost of C$250.00 per call. 

Steering committee meetings and conference calls: The 
steering committee will meet each year alongside the 
annual investigators meeting, so there will be no addi- 


C$450,735.01 


tional travel costs for steering committee meetings. In 
addition, the steering committee will have conference 
calls twice a year to discuss the trial’s progress. We esti- 
mate that each conference call will cost C$250 and the 
annual cost for the steering committee conference calls 
will be C$500.00 per year. 

Central outcomes adjudication committee (CAC) meetings 
and conference calls: There will be six members in the 
central outcomes adjudication committee. It is crucial 
to have experienced committee members who have par- 
ticipated in prior orthopaedic trauma studies. They will 
meet in conjunction with an investigators meeting and 
will have three conference calls per year. Due to the 
amount of callers and geographic range of locations, 
we estimate that each conference call will cost C$250 
and that the annual cost for the central outcome adjudi- 
cation committee calls will be C$750.00 per year. 
Travel to clinical centers and meetings: The study coordi- 
nator will visit all study sites at least once during the 
term of the study and more frequent visits will ensue 
as needed. The study coordinator will meet with the 
site investigators and clinical research coordinators to 
review local procedures; to discuss strategies to opti- 
mize enrollment and protocol adherence; and to review 
the study protocol, case report forms, and a random se- 
lection of study patient files. From our experience with 
previous trials, these site visits are invaluable to ensuring 
the smooth running of the study. We estimate that this 
will cost C$1500.00 per site visit and the total cost will 
be C$97,500.00 (C$1500.00 per visit x 65 visits) through- 
out the study or C$24,375.00 per year. In addition, the 
study coordinator will arrange regional training sessions 
for the clinical research coordinators. Training sessions 
will also be provided. We estimate that each training ses- 
sion will cost C$10,000 and there will be five sessions in 
the first year of the study, for a total cost of C$50,000.00 
(Table 31.21). 
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Table 31.21 Meetings, Travel, and Communications by Year 
































Meetings, Travel, and Communications Year 1 Year 2 Year 3 Year 4 Total 
cs cs cs cs cs 

Investigator meetings 45,000.00 46,350.00 47,740.50 49,172.72 188,263.22 
Data safety and monitoring board 10,250.00 10,557.50 10,874.23 11,200.45 42,882.18 
Steering committee 500.00 515.00 530.45 546.36 2,091.81 
Adjudication committee 750.00 772.50 795.68 819.55 3137.72 
Clinical centers 74,375.00 25,106.25 25,859.44 26,635.22 151,975.91 
Total 130,875.00 83,301.25 85,800.29 88,374.30 388,350.83 
Total meetings, travel, and communication costs: C$388,350.83 
3 % has been added each year to cover the cost of inflation. 
Table 31.22 Total Expenses for FEMUR Study by Year 
Expense Summary Year 1 Year 2 Year 3 Year 4 Total % of Total 

cs cs cs cs cs 
Payments to clinical sites 1,338,255.22  1,074,782.61 608,864.35 178,162.01 3,200,064.18 66.99% 
Central coordination staffing 174,773.69 163,322.65 168,222.33 231,341.88 737,660.55 15.44% 
Central coordination administration 176,329.34 111,938.53 82,491.50 79,975.64 450,735.01 9.44% 
Meetings, travel & communications 130,875.00 83,301.25 85,800.29 88,374.30 388,350.83 8.13% 
Total 1,820,233.25 1,433,345.04 945,378.47 577,853.83 4,776,810.58 100.00% 





Total costs for FEMUR study: 


3 % has been added each year to cover the cost of inflation. 


Examples from the Literature: Summary of Expenses 
for the Entire Trial 

Cumulative expenses for the study are summarized in 
Table 31.22. 


C$4,776,810.58 


Preparing a Budget for a Small, Single-Center Pilot Study 


Preparing a budget for a single-center initiative is similar 
to the process of preparing a budget for a large multicenter 
initiative as described previously in this chapter. Many of 
the costs vary depending on the number of centers and 
the sample size of the trial. Other items have fixed costs 
that are independent of the trial’s sample size and number 
of centers. These items require the same amount of time or 
have the same equipment costs in a small and a larger trial. 
For example, the costs of purchasing a data management 
system or arandomization system are the same, regardless 
of the size of the trial. In addition, all randomized con- 
trolled trials will require programming of the randomiza- 


tion system, preparing the data management system, and 
developing the case report forms. If the sample size is 
smaller and at only one center, the roles of the clinical re- 
search coordinator, the study coordinator, and research as- 
sistant are often combined into one fulltime position. 
Other items in the budget detailed above for the large de- 
finitive trial, such as shipping of material and traveling to 
clinical sites will not be relevant for a single-center initia- 
tive. The steps outlined above in the Key Concepts: Costs to 
Consider section can be followed to develop an accurate 
budget for a single-center initiative. 
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Reviewing an Offer to Participate in a Clinical Trial 


A colleague or a representative from industry may invite 
you to participate in a clinical trial by enrolling and follow- 
ing patients in a trial. This section will focus on reviewing 
an offer to participate in a clinical study. Participating in 
clinical trials requires a substantial commitment of both 
time and effort and participation often continues for 
months or even years.” There can be financial incentives 
to participating in clinical trials; however, other incentives 
include a chance to collaborate with other clinical investi- 
gators and opportunities to improve knowledge about the 
disease and treatment being investigated. Another advan- 
tage, which may be offered in some trials, is the exposure 
to new investigative techniques or access to special equip- 
ment or facilities.’ 

The scientific, practical, and financial implications need 
to be considered before agreeing to participate in a clinical 
trial.? The first item to assess is the study question and the 
study methodology. It is important to ensure that it is a re- 
levant question and the study methodology is sound and 
will meet the goals of the trial. The eligibility criteria 
must be carefully assessed to insure that your clinical site 
has sufficient eligible patients to be recruited for a clinical 
study. It is often necessary to perform a survey of poten- 
tially eligible patients over a 4-week period to provide an 
accurate estimate of recruitment rates. 


It is important to assess the impact of the trial on your 
patients, as your primary obligation is to protect the wel- 
fare your patients.” It is important to consider if the pa- 
tients will be required to have any investigations or proce- 
dures that are not part of standard care, and whether these 
will be painful and possibly put the patients at risk. You 
need to estimate how much time the patient will devote 
to the study and how much, if any, compensation the pa- 
tient may receive. 

The investigator should consider all potential costs and 
prepare an accurate budget for running the clinical trial 
at his or her clinical site. These costs include: (1) adminis- 
trative assistant time; (2) clinical research coordinator 
time; (3) supplies; (4) expenses incurred by the patient; 
(5) the cost of extra medical and hospital costs (i.e., 
pharmacy, radiology, laboratory tests); and (6) physician 
time. It is also important to include departmental and 
institutional overhead costs in the budget and have your 
institution’s grants and contracts office as well as your 
research ethics board review your budget. 

A sample budget template for an offer to participate in a 
randomized controlled trial is given in Table 31.23. You 
have been asked to enroll 10 patients into the trial, follow 
them at the required follow-up visits, and have extra 
laboratory tests and radiographs taken. 


Key Concepts: Reviewing an Offer to Participate in a 
Clinical Trial — Sample Budget Template 


Table 31.23 Reviewing an Offer to Participate in a Clinical Trial: Sample Budget Template 


Item Tasks 


Administrative assistant 
and photocopying 


Filing, scheduling follow-up appointments, 


Cost 


25 h/patient @ C$20/h + 30% fringe benefits for 
10 patients = C$6500.00 





Clinical research coordinator 


the study 


Completion of research ethics application and 
communication with ethics board throughout 


70 h for the duration of the study at C$40.13 h + 30% 
fringe benefits = C$3651.83 





Patient tasks: Obtaining informed consent, 
ensuring patient receives treatment they are 


35 h/patient @ C$40.13 h + 30% fringe benefits for 
10 patients = C$18,259.15 


randomized to, completing baseline case report 
forms, collecting postoperative case report 
forms, and completing follow-up case report 


forms (5 time-points) 





Submission of data to the central methods 
center — data submitted via fax and radiographs 


submitted via e-mail 


3 h/patient at C$40.13 + 30% fringe benefits/h for 
10 patients = C$156.51 





Monitoring visits: 10 monitoring visits that will 


last a full day each 


70 h at C$40.13 + 30% fringe benefits/h 
= €$36,518.30 





Office supplies 


File folders for shadow charts, paper, pens, etc. 


C$25/patient for 10 patients = C$250.00 





Patient incurred expenses Parking, lunch vouchers 


C$25/visit for 5 visits/patient for 10 patients 
= €$1250.00 


(Continued) 
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Table 31.23 Reviewing an Offer to Participate in a Clinical Trial: Sample Budget Template (Continued) 


Item Tasks 


Radiology 


Extra X-rays are incurred at each visit 


Cost 
C$200/visit for 7 visits for 10 patients = C$14,000.0 





Laboratory medicine 


Extra blood work is required at visit 


C$50/visit for 7 visits for 10 patients = C$3500.00 





Physician (surgeon) fees 


Review of case report forms, radiographs, 


15 h/patient at C$200/patient for 10 patients 











and adverse events = €$30,000.00 
Research ethics review fee C$2000.00 
Total direct costs C$116,085.79 
Indirect costs (overhead costs) 30% of entire budget C$34,825.74 





Total Direct and indirect costs 


Jargon Simplified: Direct and Indirect Costs 

Direct Costs: Direct costs are the items that are directly 
related to the research initiative. These include salaries, 
equipment, supplies, and travel. 

Indirect Costs: Indirect costs are administrative over- 
head, facilities, and implicit administrative costs in- 
curred by an institution in its participation in a research 
project or study. The administration of a research trial 
needs to be housed within the framework of an institu- 
tion, hospital, or university. Such a framework provides 
the support of a business office, department of human 
resources, office of grants and contracts, academic af- 
fairs office, and library resources, as well as a whole 
range of other administrative offices and institutional 
resources. An indirect cost rate, which varies by institu- 
tion, should be included in the study budget. 


Conclusions 


Preparing an accurate and detailed budget is essential to 
the successful completion of all research trials. The sample 
budget justification can be used to complete detailed bud- 
gets for both large, multicenter trials and small single-cen- 
ter initiatives. It is also important to prepare a detailed 
budget when reviewing an offer to participate in a clinical 
trial that is being led by another investigator. 


Suggested Reading 


Bhandari M, Schemitsch EH. Beyond the basics: the organization and 
coordination of multicentre trials. Tech Orthop 2004; 19:83—87 


C$150,911.53 


The time required for the overall coordination, the patient 
enrollment and follow-up, and the data collection also 
needs to be carefully considered. It is best to have an ex- 
perienced clinical research coordinator carefully review 
the protocol and case report forms to provide an accurate 
estimate on the time and resources required to success- 
fully participate in a clinical trial. Tasks often take longer 
than anticipated, so it is a good idea to add in a little extra 
time to account for the unanticipated.? It is also important 
to know whether the study will be published and what the 
authorship policies are.” After carefully weighing these 
items, you can make an informed decision to participate 
in a clinical trial. 


Remember to include the application completion time 
and review board fees when preparing your study budget. 

You note the application fee here in the table but not 
within the text. The "time" aspect is not noted at all in 
this chapter. 
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Research Ethics, Review Boards, and Consent Forms 


“The ethical conduct of research rests on three guiding principles: respect for persons, 


beneficence, and justice.” 


Summary 


The mandate of the institutional review board or the re- 
search ethics board is to safeguard the rights, safety, and 
well-being of all research participants. The institutional re- 
view board or research ethics board reviews all research 
projects to ensure that they meet acceptable ethical and 
scientific standards and that adequate facilities and re- 


Introduction 


All clinical research studies must comply with government 
and institutional policies. Each academic or medical insti- 
tution will have a research ethics board (if in Canada) or an 
institutional review board (if in the United States), that 
oversees all research done at its facility. The development 
and sale of drugs and devices for human use is supervised 
by government regulatory agencies to ensure that pro- 
ducts are safe and effective for their intended use, and 
that all aspects of development, manufacturing, and clini- 


Phases of Clinical Trials 


Appropriate regulation of clinical trials requires some for- 
malization of trial procedures. The Food and Drug Admin- 
istration (FDA) first described four phases for development 
of anew drug for humans.' This terminology is now widely 
accepted throughout the pharmaceutical industry (Table 
32.1). It is not commonly used in academic- or investiga- 
tor-initiated trials, but it is important to know if you parti- 
cipate in an industry-sponsored clinical trial. 


Jargon Simplified: Academic- or Investigator-Initiated 
Trial 

In an academic or investigator-initiated trial, the investi- 
gator proposes a research question and typically devel- 
ops the study protocol. 


Jargon Simplified: Industry-Sponsored Trial 

In an industry-sponsored trial, a pharmaceutical or de- 
vice company proposes the research questions and pro- 
vides the study protocol. 


— Charles Weijer 


sources are available. The institutional review board or re- 
search ethics board can also provide advice on the ethical, 
scientific, and technical aspects of planning research pro- 
jects. In this chapter, an overview of research ethics, insti- 
tutional review boards/research ethics boards, and con- 
sent forms is provided. 


cal investigation conform to agreed quality standards.! All 
study investigators and research staff must comply with 
the regulations specific to their local review boards, in- 
cluding submitting all studies for review and maintaining 
regular communication with the review board. The pur- 
pose of this chapter is to provide an overview of research 
ethics, review boards, consent forms, and obtaining in- 
formed consent. 


Table 32.1 Four Phases of Clinical Intervention Trials 


Phase! First investigation of a new intervention in humans, 


often called “first into man” 


Investigation of the safety and side effects of the 
intervention when given in different intensities 


Small numbers of subjects, healthy volunteers for 
devices 





Phase II Information gained in Phase | trials is used to design 


Phase II trials 


Closely controlled and monitored studies conducted in 
small numbers of patients (<100) 


Provides preliminary efficacy and safety data 





Phase III Demonstrates the efficacy and safety of an intervention 
Involves hundreds or thousands of patients 


Phase IIIb studies investigate new indications for 
already interventions in use 





Phase IV Surveillance in many thousands of patients to identify 


less common adverse events 
Postmarketing studies to investigate the use of an 
intervention in special populations of patients 
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Good Clinical Practice Guidelines 


The International Conference on Harmonization of Techni- 
cal Requirements for Registration of Pharmaceuticals for 
Human Use is a joint initiative involving regulators and 
the pharmaceutical industry.' The International Confer- 
ence on Harmonization’s Good Clinical Practice (ICH 
GCP) guideline is an international ethical and scientific 
quality standard for the design, conduct, recording, and re- 
porting of trials involving human subjects.! It was devel- 
oped to protect the rights, safety, and well-being of re- 
search participants, based on the Declaration of Helsinki, 
and to assure the credibility of clinical trials. The guideline 
provides a unified standard for the European Union, Japan, 
and the United States to facilitate mutual acceptance of 
clinical data by regulatory authorities in these jurisdic 


Clinical Trial Audits 


The ICH GCP guideline defines an audit as a systematic and 
independent examination of trial-related activities and 
documents for industry-sponsored trials. The aim is to en- 
sure that trials are conducted in accordance with the trial 
protocol, the sponsor’s standard operating procedures 
(SOPs), and all applicable guidelines and regulatory re- 
quirements. 


Jargon Simplified: Standard Operating Procedures 

A standard operating procedure (SOP) is a set of instruc- 
tions having the force of a directive, covering those fea- 
tures of operations that lend themselves to a definite 
or standardized procedure without loss of effectiveness. 
In clinical research, SOPs are defined by the Interna- 
tional Conference on Harmonisation as “detailed, writ- 
ten instructions to achieve uniformity of the perfor- 
mance of a specific function.” SOPs are necessary for a 
clinical research organization - whether it concerns a 
pharmaceutical company, a sponsor, a contract research 
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tions.' The subject areas covered by the guideline include 
a glossary, principles, institutional review board/research 
ethics board, the investigator, the study sponsor, clinical 
trial protocol and protocol amendments, investigators bro- 
chure, and essential documents for the conduct of a clinical 
trial (i.e., the investigator brochure, trial protocol, and ethi- 
cal committee approval).! 


Jargon Simplified: Study Sponsor 

The primary role of the study sponsor is to carry the ul- 
timate responsibility for the initiation, management, 
and financing (or arranging the financing) of the clinical 
trial. 


organization, an investigator site, an ethics committee or 
any other party involved in clinical research - to achieve 
maximum safety and efficiency of the performed clinical 
research operations.” 


Clinical trial audits are performed by regulatory authori- 
ties, trial sponsors, or organizations nominated by the trial 
sponsor.' The regulatory authorities in the United States 
and European Union perform audits of the sponsor, drug 
development facilities, manufacturing plants, and study 
sites.' Study sites are provided with advanced warning of 
an audit. Negative audit findings vary in severity from de- 
ficiencies in essential trial documentation that can easily 
be rectified, to errors in consent procedures and investiga- 
tor fraud. Serious discrepancies may lead to termination of 
a trial at a study site or legal proceedings against an inves- 
tigator. It is extremely important to ensure that you are fol- 
lowing good clinical practice and all of the regulations 
when participating in an industry-sponsored trial. 


What Investigators Need to Know about Regulatory Issues 


Usually investigators do not need a detailed knowledge of 
regulatory affairs to participate in an industry-sponsored 
clinical trial. Industrial sponsors have regulatory affairs de- 
partments to ensure that they comply with the current 
regulations and guidelines, as mistakes may adversely af- 
fect applications for regulatory approval.'! Although the 
study sponsor ensures that all necessary documents are 
completed and submitted to the appropriate regulatory 
body, the regulatory process may impose a substantial 
workload on the study site personnel. Investigators must 
be meticulous about data quality, the administration of 


all trial documentation, and adherence to the trial protocol 
and consent procedures to fulfill their responsibilities ac- 
cording to the ICH GCP guidelines.' 


Key Concepts: The Investigator’s Role 

Investigators must be meticulous about data quality, the 
administration of all trial documentation, and adher- 
ence to the trial protocol and consent procedures to ful- 
fill their responsibilities according to the ICH GCP guide- 
lines. 
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The Role of the Ethics Committee 


An ethics committee (research ethics board/institutional 
review board) is an independent body of medical profes- 
sionals and lay members.! The primary responsibility of 
an ethics committee is to ensure the safety, well-being, 
and human rights of the research participants. Ethics com- 
mittees review the study protocol, the case report forms, 
the study budget, the consent form, and any other study 
documentation to ensure that the trial is justified, safe, 
and that the patients are properly informed about the re- 
search. Any research involving human participants must 
be submitted to the ethics committee prior to beginning 
the trial. 

The legal status, constitution, and responsibilities of 
ethics committees may differ from country to country. 
The ICH GCP guideline recommends that an institutional 
review board should comprise at least five members, at 
least one of which should have a primary interest in a non- 
scientific area, and at least one of which should be inde- 
pendent of the institution or trial site.' The responsibilities 
of the ethics committee are listed in Table 32.2. 


Key Concepts: Ethics Approval 

The procedures for ethical approval vary from country to 
country and from institution to institution, so it is neces- 
sary to contact your local review board for the specific 
details. 


The research review board will have an application form 
that is required to be completed for all new research pro- 
jects involving human participants. The form will ask for 
a study summary, the inclusion and exclusion criteria, 
the responsibilities of the research participants, the risks 


Table 32.2 Responsibilities of the Ethics Committee 


To review an process an application for ethical approval of a 
research protocol in a reasonable time 





To consider the qualifications of the investigator 





To review each ongoing trial at an institution 





To recommend modifications to the patient information and 
consent form, when appropriate 





To review payments to trial participants 





To determine that, when necessary, the trial protocol addresses 
ethical concerns consent by a patients legal representative and 
studies where prior consent is not possible 





To perform duties in accordance with written operating proce- 
dures 





To retain all relevant records for at least three years after com- 
pletion of a trial 


Source: Data from McMaster University. Research Ethics Board 
Guidelines. Available at: http://www.fhs.mcmaster.ca/csd/ethics/ 
docs/hhs-fhs-guidelines.doc. Accessed July 1, 2006. 


that the research participants may experience, and any 
benefits to society or the research participants that may re- 
sult from the study. The study protocol, budget, case report 
forms, consent forms (described below), and any other 
study documentation should be included with the applica- 
tion form. Most institutions also require several signatures 
from different departments within the academic institu- 
tion. These include, but are not limited to, the local respon- 
sible investigator, the department chair or department 
manager, the managers of different hospital departments 
(i.e., operating room, emergency room, hospital ward, in- 
tensive care unit, etc.). 

It is important to allocate a couple of days to complete 
the application form and to include this cost when budget- 
ing for a trial. In addition, some review boards charge a fee 
to review industry-sponsored trials. For instance, the 
McMaster University Research Ethics Board charges 
$2,000 to review industry-sponsored trials.’ This needs 
to be included when budgeting for a trial, as discussed in 
Chapter 31. 


Key Concepts: Budgeting 

Remember to include the application completion time 
and review board fees when preparing your study bud- 
get. 


The members of the review board independently review 
each application that they receive and then discuss them 
at a meeting. The review board often has several questions 
or concerns that must be addressed by the investigator 
prior to beginning the study. The investigator cannot begin 
the study until the review board has provided full approval 
of the study. Along with the full approval letter, the review 
board “stamps” the consent form, indicating that this con- 
sent form has been approved. It is important to remember 
to use the consent form that has been stamped by the re- 
view board. The process of ethics approval usually takes 
6 to 12 weeks at a large academic center, depending on 
how often the committee meets and how many amend- 
ments are required. 

After receiving full approval and beginning a research 
study or clinical trial, it is necessary to continue to commu- 
nicate regularly with your research ethics board/institu- 
tional review board. All amendments to the protocol, con- 
sent form, and case report forms must be promptly re- 
ported, using an amendment form. The new protocol, con- 
sent form, or case report forms cannot be implemented 
until the review board has approved them. It is a good 
idea to date each item and include a draft number on 
each item to avoid accidentally using an old version. The 
amendment procedure is typically faster that the initial 
approval, but it does depend on the individual review 
board. 
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In addition to reporting any revisions to the local review 
board, it is also necessary to report any adverse events. An 
adverse event is defined as any untoward medical occur- 
rence in a research participant, which does not necessarily 
have a causal relationship with the treatment.! An adverse 
event can be classified as mild, moderate, or severe.' Many 
review boards require serious adverse events to be re- 
ported within 24 hours and nonserious adverse events to 
be reported within 48 hours. In addition, in an industry- 
sponsored trial, the sponsor will have guidelines, time- 
lines, and case report forms for reporting all adverse 
events. A serious adverse event is an adverse event which 
results in death, is life-threatening, requires in-patient 
hospitalization (or prolongs existing hospitalization), re- 
sults in persistent or significant disability or incapacity, 
or is a congenital anomaly or birth defect.'! A nonserious 
adverse event is any adverse event that does not meet 
the criteria of a serious adverse event. The prompt and ac 
curate reporting of adverse events is a protocol require- 
ment in most studies and is essential for the safety of the 
study participants.' In addition, all adverse events are gen- 
erally reported to the data safety and monitoring board, 
which is discussed in detail in Chapter 37. 


Jargon Simplified: Adverse Event 

An adverse advent is “any untoward medical occurrence 
in a patient or clinical investigation subject administered 
a pharmaceutical product and which does not necessa- 
rily have a causal relationship with this treatment. An 
adverse event (AE) can therefore be any unfavorable 
and unintended sign (including an abnormal laboratory 
finding), symptom, or disease temporally associated 
with the use of a medicinal (investigational) product, 
whether or not related to the medicinal (investigational) 
product.”4 


Jargon Simplified: Serious Adverse Event 

A serious adverse event is “any untoward medical occur- 
rence that at any dose results in death, is life-threaten- 
ing, requires inpatient hospitalization or prolongation 
of existing hospitalization, results in persistent or signif- 
icant disability/incapacity, or is a congenital anomaly/ 
birth defect.” 


Preparing a Consent Form 


The consent form and/or information sheet should be writ- 
ten at a grade 8 reading level.? Consent forms and patient 
information sheets may be combined or written sepa- 
rately. If investigators propose to study recipients’ thera- 
peutic care, invitations to participate in the research 
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Some research ethics boards/institutional review boards 
require the reporting of protocol deviations or violations. 
A protocol deviation or violation is an unanticipated or un- 
intentional divergence or departure from the expected 
conduct of clinical trial that is not consistent with the cur- 
rent research protocol, consent document, or standard op- 
erating procedures. The review boards may also have ex- 
plicit timelines for reporting protocol deviations and viola- 
tions. 


Jargon Simplified: Protocol Deviation or Violation 

A protocol deviation or violation is any action involving a 
research participant or a potential research participant 
that is outside the specified protocol guidelines. 


Most ethic committees require annual updates and have a 
specific form for submitting the annual updates. The an- 
nual updates ask about the current status of the trial (i.e., 
number of research participants enrolled, number of re- 
search participants who have completed follow-up, num- 
ber of adverse events, any changes to the study protocol, 
etc). The annual update also asks about any published 
data or presentation of abstracts at meetings. 

When a trial is concluded and all data analysis is fin- 
ished, a study completion form is submitted to the research 
ethics board/institutional review board. This form includes 
the number of patients enrolled, the number of patients 
who completed the study, the number of patients who 
have been withdrawn or dropped out, the number of ser- 
ious adverse events, any publications or presentations 
that have resulted, and why the study has stopped. In ad- 
dition to study enrollment and protocol being completed, 
there are several reasons that studies may be stopped. 
These include the study never received funding, the site- 
principal investigator has left the institution, there were 
not enough participants for the study to be completed, 
the study was closed due to adverse events, or the proce- 
dure or drug or device has received approval. 


Key Concepts: Communication 

Remember to communicate regularly with your local re- 
view board. This includes submitting annual reports, re- 
porting adverse events and protocol deviations, and 
study completion forms. 


should ideally be made by persons on whom the research 
participants have no dependency.’ Use of verbal consent 
must be justified in the protocol.? Written consent forms 
must be on the institution’s letterhead (Table 32.3). 
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Table 32.3 General Requirements of Informed Consent Forms 


Print the information sheet and consent form on appropriate institutional letterhead. 





Include the title of the study, names of the site principal investigator, site co-investigators, and sponsor. 





Number pages appropriately. 





Include a version date. 





Ensure use of the proper names of institutions, sites, sponsors, etc.; avoid use of acronyms or abbreviations. 





The information sheet should begin with the phrase “You are being invited to participate in a research study...” 





Ensure the Information sheet is written consistently in the second person (“you”/“your”). 





The consent portion should be written in the first person singular ( 
and agrees to participate in the research. 


apes ” a 
’ 


me,” “my”) and should indicate that the participant understands 





Ensure a thorough check for spelling, punctuation, and grammar. 
Ensure that all consent materials are printed in a suitable type-size for easy reading by participants. 


Avoid use of acronyms that form actual words while abbreviating the study title. To avoid any possibility of unduly influencing partici- 
pants, do not use acronyms that may give participants the expectation of a favorable outcome (e.g., the “W.O.N.D.E.R. D.R.U.G.” study). 





Research participants must be informed if their physician will receive a fee for enrolling them in a study. 





Include a sentence identifying the person that participants should contact with any questions concerning the rights of trial participants. 





Ensure sufficient space is provided for both the printed name and signature of each person completing the consent portion (participant, 
witness, and investigator) together with the date of each signature. 





If the participant is an older minor, the participant should sign to give assent to the research in addition to the guardian’s consent. 





Include a sentence in the consent form stating that “I will receive a signed copy of this form”. 





Ensure the information sheet and consent form are at a grade 8 reading level. 





Describe procedures to ensure confidentiality of data and anonymity of participants. Provide information on length of retention and 
security of data. If information will be released to any other party for any reason, state the person/agency to whom the information will be 
furnished, the nature of the information, and the purpose of the disclosure. 


If activities are to be audio- or videotaped, describe the participant’s right to review the tapes, who will have access to there, if they will be 
used for educational purposes, and when they will be erased. 


Source: Data from McMaster University. Informed Consent Checklist. Available at: http://www.fhs.mcmaster.ca/csd/ethics/docs/appln- 


hhs.doc. Accessed July 1, 2006. 


Obtaining Informed Consent 


Informed consent is the process by which a patient volun- 
tary confirms his or her willingness to participate in a clin- 
ical trial. In addition to providing the patient with a writ- 
ten consent form, approved by your local research ethics 
board/institutional review board, the study investigator, 
study coordinator, or delegate must verbally inform the 
patient of all aspects of the trial (Table 32.4).° Different re- 
search ethic boards/institutional review boards have dif- 
ferent policies regarding who can obtain informed con- 
sent; hence, it is necessary to confirm who will be obtain- 
ing consent prior to enrolling your first research partici- 
pant. 


Jargon Simplified: Informed Consent 

Informed consent is the process by which a patient vo- 
luntary confirms his or her willingness to participate 
in a clinical trial. 


Informed consent must be documented in a written form, 
signed, and personally dated by the patient or by the pa- 
tient’s legally acceptable representative, and by the person 
who conducted the informed consent discussion.® A wit- 
ness should also sign the consent form. Three copies of 
the consent form must be signed and dated by each of 
the individuals.? One signed copy goes into the patient’s 
hospital chart, the second signed copy is kept by the study 
investigator or delegate, and the third signed copy is for the 
patient. 

The informed consent discussion should be conducted in 
a quiet room and with adequate time allowed for questions. 
During the discussion, it is vital to communicate in non- 
technical language and to take into account any language 
barriers, using a translator if available.° If possible, potential 
research participants should be provided with the opportu- 
nity to discuss the study and the study requirements with 
their families prior to agreeing to participate. However, this 
is not always possible in nonelective surgical trials. 
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Table 32.4 What Patients Need to Know before Participating in Clinical Research 


The purpose of the research 





The trial treatment(s) and the probability for random assignment to each treatment 





The trial procedures to be followed, including all invasive procedures 





The research participant’s responsibilities 





Any aspects of the trial that are experimental 





The reasonably foreseeable risks and benefits 





Any alternative treatment(s) that may be available 





Any compensation available to the patient in the event of a trial-related injury 





The anticipated payment or expenses, if any, to the patient for participating in the trial 





The patient’s participation is voluntary and that the patient may refuse to participate or withdraw from the trial at any time without 


prejudice 


Who will have access to their original medical records 





The records identifying the patient will be kept confidential. 





If the results of the trial are published, the patient’s identity will remain confidential. 





The patient will be informed if information becomes available that may be relevant to the patient’s willingness to continue to participate 


in the trial. 





The person to contact for additional information on the trial 





The foreseeable circumstances or reasons under which the patient’s participation in the trial may be terminated 





The expected duration of participation 





The approximate number of patients involved in the trial 


Source: From Sprague S, Hanson B, Bhandari M. Informed consent: what your patients need to know about entering clinical research 


studies. Can | Diagn 2003; 20(10):29-31. Reprinted by permission. 


Rights of Research Participants 


Patients who participate in research trials have many 
rights and they need to be informed of these rights during 
the informed consent discussion. Three of the most funda- 
mental rights are listed in Table 32.5. 


Key Concepts: Responsibilities 

The responsibility of a regulatory authority is to ensure 
that products are safe, effective for their intended use, 
and that all aspects of development, manufacturing, 
and clinical investigation conform to agreed quality 
standards. The responsibility of the ethics committee is 
to ensure that the safety, well-being, and human rights 
of patients participating in a clinical trial are protected. 


Research Participant Confidentiality 


The study investigator and all study staff are responsible 
for maintaining patient confidentiality throughout the 
clinical trial. Patient confidentiality refers to the nondisclo- 
sure of a research participant’s identity and medical infor- 
mation to nonauthorized individuals. The study partici- 


Table 32.5 Three of the Most Fundamental Patient Rights 


The patient’s participation in the research trial is voluntary. 





The patient has the right to refuse to participate or withdraw from 
the trial without providing a reason. Refusing to participate or 
withdrawing from the trial will not affect his or her subsequent 
medical care. 





The patient will be informed of any new findings that may affect 
his or her willingness to continue participating in the trial. 


Source: Data from Sprague S, Hanson B, Bhandari M. Informed 
consent: what your patients need to know about entering clinical 
research studies. Can | Diagn 2003;20(10):29-31. 


pant’s involvement in a clinical trial must be kept private 
and confidential between the study investigator, the ap- 
propriate study staff, and the primary care physician. 
The information is only to be shared when there is written 
permission by the study participant. 
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Jargon Simplified: Patient Confidentiality 

Patient confidentiality refers to the nondisclosure a re- 
search participant’s identity and medical information 
to nonauthorized individuals. 


All research participants’ names and data obtained from 
medical records must be kept confidential. Patients are 
identified through a number that is assigned to them at 
study enrollment, referred to as the study identification 
number. This list should be stored separately from the 
study participant’s case report forms. All data must be 
stored in a secure area, such as a locked cabinet or a pass- 
word-protected computer in a locked office. The case re- 
port forms should not include any patient identifiers in- 
cluding the patient’s name, hospital identification number, 
hospital or medical billing number, or patient contact in- 
formation (i.e., telephone number, address, alternate con- 
tact information, etc.). Even the patient’s signed consent 
form should not be stored with the case report forms. In 


Conclusions 


This chapter has provided an overview of research ethics, 
ethic committees, and obtaining informed consent of all 
research participants. It is important to communicate 
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many studies, “shadow charts” are made, which store the 
patient consent form, the patient’s contact information, al- 
ternate contact information, any source documents (surgi- 
cal reports, clinical notes, radiograph reports, etc.). 


Jargon Simplified: Shadow Chart 

A shadow chart is a separate chart containing copies of 
all original source documents (surgical reports, clinical 
notes, radiograph reports, etc.) associated with the 
research participant. 


Finally, all patient identifiers must be removed when pro- 
viding data or reports to study committees such as the 
steering committee, the adjudication committee, or the 
data safety monitoring committee (all described in other 
chapters); to study sponsors; or when presenting abstracts 
at meetings or submitting manuscripts to a journal. It is 
best to present data in aggregate when possible. 
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Regulatory Issues in the Evaluation of a New Device or Drug 


“Each problem that I solved became a rule which served afterwards to solve other 


problems.” 


Summary 


In this chapter, surgeons are informed about key issues in 
designing and conducting studies to evaluate a new device 
or drug. The regulatory bodies involved in approving a 
study protocol, how they monitor the conduct of studies, 


Introduction 


A new drug or medical device may prove to be more effec- 
tive, economical, or easier to use than those currently 
available. Patient safety and the efficacy of the new drug 
or device must be proven before it can be widely used. In 
the United States and Canada, federal agencies regulate 
the testing and approval of new drugs and devices for 
use in humans. Those who develop new drugs and devices 
must conduct the investigational research and provide 
these agencies with the relevant safety and efficacy data. 
The regulatory agency reviews the information, and ap- 
proves new drugs or devices that meet preset criteria for 
approval. 

To navigate this process of testing and obtaining ap- 
proval of a new drug or medical device for use among hu- 
mans, it is essential for investigators to be aware of the pro- 
cedures that apply to them. These include applying for ap- 
proval by the ethics board at all institutions involved in the 
investigational research and applying for federal approval 
to investigate the new drug or device. During the study, in- 
vestigators must ensure compliance with federal guide- 
lines, regularly report patient safety data, and be prepared 


— René Descartes 


and which criteria they use to approve the device or drug 
when the study is completed are the focus of this chapter. 
Additionally, the phases of clinical trials and importance of 
patient safety are discussed. 


for federal agency monitoring. Upon completion of the in- 
vestigational research, investigators apply for approval for 
the new drug or device. Good communication with the reg- 
ulatory bodies involved will ease this process, avoiding un- 
necessary delays or waste of resources. 


Key Concepts: Key Steps in the Development of a New 

Device or Drug 

1. Complete preclinical (animal) testing 

2. Prepare a protocol for a clinical study 

3. You may request a consultation meeting with the 

regulatory agency prior to applying to conduct a clin- 

ical trial 

Apply for approval from local research ethics board 

Apply for approval from regulatory agency to use 

drug or device for investigational purposes 

6. Conduct clinical trials 

Repeat Steps 2 though 5 for clinical trial phases I 

through III 

8. Apply for approval from regulatory agency for mar- 
keting of the new drug or device 


wb 


N 


Phases of Clinical Trials for Investigation of a New Drug or Device 


The different types of trials that are conducted during the 
evaluation of a new drug or device in humans are known as 
“phases” (Table 33.1).!? 


Key Concepts: Phases of Clinical Trials’ 

“Phase I: Studies that investigate an intervention’s phy- 
siological effect or ensure that it does not manifest 
unacceptable early side-effects, often conducted in 
normal volunteers 


Phase II: Initial studies on patients, which provide preli- 
minary evidence of possible intervention effective- 
ness 

Phase III: Randomized controlled trials designed to defi- 
nitively establish the magnitude of intervention ben- 
efit 

Phase IV: Postmarketing surveillance studies, studies 
conducted after the effectiveness of an intervention 
has been established and the intervention marketed, 
typically to establish the frequency of unusual side- 
effects”! 
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Table 33.1 Phases of Study for a New Drug or Device 











Design Population Number Focus 
Phase | Case series Healthy volunteers 20-80 Safety, toxicity 
Phase II Controlled trial Patients with a certain disease or condition Few dozen to -300 Efficacy 
Phase III Randomized controlled trial Different populations with different dosages Several hundred to Effectiveness 
-3000 
Phase IV Survey Variable Variable Safety 


After safety and efficacy have been studied in animals, 
phase I trials are the first trials of a new intervention in 
humans and focus on patient safety.? These studies are 
usually a case series of healthy volunteers using incremen- 
tal doses of a drug. Patients are monitored for adverse ef- 
fects,” and to determine the maximally tolerated dose.' 


Jargon Simplified: Case Series 

A case series is “a study reporting on a consecutive col- 
lection of patients treated in a similar manner, without 
a control group. For example, a surgeon might describe 
the characteristics of an outcome for 100 consecutive 
patients with cerebral ischemia who received a revascu- 
larization procedure.”! 


Phase II trials usually focus on the efficacy of an interven- 
tion given to groups of patients under different conditions, 
such as intensity, while using objective measurements of 
its therapeutic effects.’ If the therapeutic effects are more 
subjective, investigators can randomize patients to the 
new intervention or placebo. No additional trials will 
take place if the intervention is ineffective or has too 
many adverse effects, but if there is a good response with 
tolerable adverse effects a phase II trial will be planned. 


Jargon Simplified: Efficacy 
“Efficacy refers to whether an intervention works in 
people who receive it.” 


Phase II trials are usually randomized controlled trials 
that compare the effectiveness of the new intervention to 
the current standard of care.” These trials are also useful 
to compare rates of adverse effects.! 


Jargon Simplified: Randomization 

Randomization is the “allocation of individuals to groups 
by chance, usually done with the aid of table of random 
numbers. Not to be confused with systematic allocation 
(e.g., on even and odd days of the month) or allocation at 
the convenience or discretion of the investigator.”! 


Jargon Simplified: Effectiveness 

“Effectiveness refers to whether an intervention works 
in people to whom it has been offered. These RCTs, also 
called effectiveness trials, tend to be pragmatic because 
they try to evaluate the effects of the intervention in cir- 
cumstances similar to those found by clinicians in their 
daily practice.” 


Phase IV trials are also called postmarketing surveillance 
studies, and are usually surveys that monitor long-term 
or rare adverse effects of an intervention after it has been 
approved.” These can be used to confirm the safety of a 
new intervention.! Postmarketing surveillance can also 
be used to further study intervention interactions. 


Ethics Approval for Investigation of a New Drug or Device 


The first step in beginning to study a new drug or device in 
humans is to obtain ethics approval from the primary in- 
stitution conducting the research, and any other institu- 
tions that will be participating. This involves submitting 
your research protocol, information and consent form, 
data collection forms, and all other relevant material to 
the local research ethics board (REB). The REB may also 
be called an institutional review board (IRB), or indepen- 
dent ethics committee (IEC), but will generally serve the 
same function. 


Jargon Simplified: Independent Ethics Committee 

An independent ethics committee is “an independent 
body (a review board or a committee, institutional, re- 
gional, national, or supranational), constituted of medi- 
cal professionals and non-medical members, whose re- 
sponsibility it is to ensure the protection of the rights, 
safety and well-being of human subjects involved in a 
trial and to provide public assurance of that protection, 
by, among other things, reviewing and approving/pro- 
viding favorable opinion on, the trial protocol, the suit- 
ability of the investigator(s), facilities, and the methods 
and material to be used in obtaining and documenting 
informed consent of the trial subjects.”? 
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The REB reviews this information to determine whether 
the research plan is scientifically sound and justified, 
that procedures for obtaining informed consent ensure pa- 
tients are fully informed about the risks involved, and that 
all adverse events will be reported in a timely fashion. The 
REB will also review any conflicts of interest and the finan- 
cial agreements involved with the study. Each institution 
will have its own local guidelines about applications to 


33 Regulatory Issues in the Evaluation of a New Device or Drug 


the REB, standardized application forms, and templates 
for patient information and consent forms. 


Jargon Simplified: Informed Consent 

Informed consent refers to “a potential participant’s ex- 
pression of willingness, after full disclosure of the impli- 
cations, to participate in a study.”! 


Federal Regulation of New Drugs and Devices 


Health Canada and the Food and Drug Administration 
(FDA) in the United States are both federal agencies that 
regulate the investigation and approval of new drugs or 
medical devices. In Canada, medical devices and drug pro- 
ducts are regulated under the Food and Drugs Act and Reg- 
ulations. In the United States devices and drugs are regu- 
lated under the Federal Food, Drug, and Cosmetics Act. 
The relevant legislation, guidelines, and application forms 
are available from these agencies to apply for investigation 
or approval of a new drug or device. These regulations ty- 
pically apply to the use of a marketed drug or device out- 
side of its approved dosing or indications. 

The role of these authorities is to review protocols for 
clinical trials to assess the safety of participants and the 
quality of the drug or device, ensure that the protocol has 
been reviewed by a REB, verify the qualifications of the in- 
vestigator, and monitor adverse events.” 

Many countries currently follow the guidelines of the In- 
ternational Conference on Harmonisation of Technical Re- 


quirements for Registration of Pharmaceuticals for Human 
Use (ICH). ICH is a joint initiative of the European Union, Ja- 
pan, and the United States to harmonize the testing and ap- 
proval processes for new drugs, and ensure their safety and 
effectiveness. Among other countries, Canada has also 
adopted the ICH guidelines. Of particular interest to clini- 
cal trialists, is the ICH guidance document on good clinical 
practice (GCP).? It is important to be familiar with these 
guidelines, as well as the guidance documents of your fed- 
eral regulatory authority when planning to evaluate a new 
drug or device. 


Jargon Simplified: Good Clinical Practice 

Good clinical practice refers to “a standard for the design, 
conduct, performance, monitoring, auditing, recording, 
analyses, and reporting of clinical trials that provides as- 
surance that the data and reported results are credible 
and accurate, and that the rights, integrity, and confi- 
dentiality of trial subjects are protected.”? 


Initiating an Investigation of a New Drug or Device 


The investigation of a new drug or device can be initiated 
by an investigator or a commercial sponsor. To begin inves- 
tigating a new drug or device in humans, your federal reg- 
ulatory agency typically will require you to submit an ap- 
plication for approval to use the drug or device for investi- 
gational purposes. If your study involves both a new drug 
and a new device, you must apply for approval for both 
(Fig. 33.1). 


Jargon Simplified: Investigator 

The investigator is “a person responsible for the conduct 
of the clinical trial at a trial site. If a trial is conducted bya 
team of individuals at a trial site, the investigator is the 
responsible leader of the team and may be called the 
principal investigator.” 


Jargon Simplified: Sponsor 

A sponsor is “an individual, company, institution, or or- 
ganization which takes responsibility for the initiation, 
management, and/or financing of a clinical trial.”? 


Jargon Simplified: Sponsor—-Investigator 

A sponsor-investigator is “an individual who both initi- 
ates and conducts, alone or with others, a clinical trial, 
and under whose immediate direction the investiga- 
tional product is administered to, dispensed to, or used 
by a subject. The term does not include any person other 
than an individual (e.g., it does not include a corporation 
or an agency). The obligations of a sponsor-investigator 
include both those of a sponsor and those of an investi- 
gator.”? 
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Study Conduct and Monitoring 


The sponsor of a clinical trial is responsible for appointing 
monitors to oversee the conduct of the trial. They are ap- 
pointed to ensure the rights of participants are protected, 
that the data are accurate and complete, and that the con- 
duct of the trial complies with the protocol and regulatory 
requirements.” 


Jargon Simplified: Monitoring 

Monitoring is “the act of overseeing the progress of a 
clinical trial, and of ensuring that it is conducted, re- 
corded, and reported in accordance with the protocol, 
Standard Operating Procedures (SOPs), Good Clinical 


*While sponsor answers any deficiencies 


tion, http://www.fda.gov/cder/ 
handbook/ind.htm (CDER, Center 
for Drug Evaluation and Research). | 





Practice (GCP), and the applicable regulatory require- 
ment(s).”? 


As a separate process, the sponsor may initiate an indepen- 
dent audit to ensure quality of the study, compliance with 
the study protocol, standard operating procedures (SOP), 
good clinical practices (GCP), and all regulatory require- 
ments.’ 


Jargon Simplified: Audit 
An audit is “a systematic and independent examination 
of trial related activities and documents to determine 
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whether the evaluated trial related activities were con- 
ducted, and the data were recorded, analyzed and accu- 
rately reported according to the protocol, sponsor's stan- 
dard operating procedures (SOPs), Good Clinical Practice 
(GCP), and the applicable regulatory requirement(s).”? 


Additionally, the sponsor may establish an independent 
data-monitoring committee (IDMC) to review the trial 
progress, safety, and efficacy data, and make recommenda- 
tions on the continuation of the trial.? 


Jargon Simplified: Independent Data-Monitoring 
Committee (Data and Safety Monitoring Board, 
Monitoring Committee, Data Monitoring Committee) 
“An independent data-monitoring committee that may 
be established by the sponsor to assess at intervals the 
progress of a clinical trial, the safety data, and the critical 
efficacy end points, and to recommend to the sponsor 
whether to continue, modify, or stop a trial.” 


The regulatory agency that approved the clinical trial typi- 
cally reserves the right to inspect any ongoing trial. For ex- 


Reporting Patient Safety Information 


Any adverse events or reactions that occur during a clinical 
trial of a new drug or device must be reported in a timely 
fashion to both the REB and the regulatory authority. These 
are usually classified as adverse drug reactions (ADR), ad- 
verse events (AE), serious adverse events (SAE), and ser- 
ious adverse drug reactions (Serious ADR), and should be 
defined in the study protocol. 


Jargon Simplified: Adverse Drug Reaction 

“All noxious and unintended responses to a medicinal 
product related to any dose should be considered ad- 
verse drug reactions. The phrase responses to a medic 
inal product means that a causal relationship between 
a medicinal product and an adverse event is at least a 
reasonable possibility, i.e., the relationship cannot be ru- 
led out.” 


Jargon Simplified: Adverse Event 
An adverse event is “any untoward medical occurrence 
in a patient or clinical investigation subject administered 


Approval of a New Drug or Device 


Once the clinical trials of a new drug or device are com- 
pleted, the sponsor may submit an application to the reg- 
ulatory authority to obtain approval to market the new 
drug or device. During this process, the regulatory author- 
ity reviews information on the drug safety and effective- 
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ample, Health Canada’s Health Products and Food Branch 
Inspectorate has a mandate to inspect clinical trials, and in- 
spects ~2% of trials each year as well as conducting inspec 
tions in response to complaints.° 


Jargon Simplified: Inspection 

An inspection refers to “the act by a regulatory authority 
(ies) of conducting an official review of documents, facil- 
ities, records, and any other resources that are deemed 
by the authority(ies) to be related to the clinical trial 
and that may be located at the site of the trial, at the 
sponsor's and/or contract research organization’s 
(CRO’s) facilities, or at other establishments deemed 
appropriate by the regulatory authority(ies).”? 


It is important to notify the REB and regulatory agency of 


any amendments to your study, and submit the required 
amendment forms to each of these bodies. Additionally, 
an annual report is usually required by the REG and regu- 
latory agency. 


a pharmaceutical product and which does not necessa- 
rily have a causal relationship with this treatment. An 
adverse event (AE) can therefore be any unfavorable 
and unintended sign (including an abnormal laboratory 
finding), symptom, or disease temporally associated 
with the use of a medicinal (investigational) product, 
whether or not related to the medicinal (investigational) 
product.”? 


Jargon Simplified: Serious Adverse Event or Adverse 
Drug Reaction 

A serious adverse event or adverse drug reaction is “any 
untoward medical occurrence that at any dose results in 
death, is life-threatening, requires inpatient hospitaliza- 
tion or prolongation of existing hospitalization, results 
in persistent or significant disability/incapacity, or is a 
congenital anomaly/birth defect.”? 


ness, the proposed labeling for the new product, and the 
manufacturing methods along with the measures that 
will be used to ensure the products’ quality (Fig. 33.2). 
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Fig. 33.2 New Drug Application (NDA) Review Process. [Courtesy of the U.S. Food and Drug Administration, http://www.fda.gov/cder/ 


handbook/nda.htm (CDER, Center for Drug Evaluation and Research).] 


Examples from the Literature: Example of a Study of a 
New Drug 

Abstract 

Source: Turpie AG, Bauer KA, Eriksson BI, Lassen MR. 
Fondaparinux vs. enoxaparin for the prevention of ve- 
nous thromboembolism in major orthopedic surgery: 
a meta-analysis of 4 randomized double-blind studies. 
Arch Intern Med. 2002;162(16):1833-1840.° 
Background: Orthopedic surgery remains a condition at 
high risk of venous thromboembolism (VTE). Fondapar- 
inux, the first of a new class of synthetic selective factor 
Xa inhibitors, may further reduce this risk compared with 
currently available thromboprophylactic treatments. 


Methods: A meta-analysis of 4 multicenter, randomized, 
double-blind trials in patients undergoing elective hip 
replacement, elective major knee surgery, and surgery 
for hip fracture (n = 7344) was performed to determine 
whether a subcutaneous 2.5-mg, once-daily regimen of 
fondaparinux sodium starting 6 hours after surgery 
was more effective and as safe as approved enoxaparin 
regimens in preventing VTE. The primary efficacy out- 
come was VTE up to day 11, defined as deep vein throm- 
bosis detected by mandatory bilateral venography or 
documented symptomatic deep vein thrombosis or pul- 
monary embolism. The primary safety outcome was 
major bleeding. 
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Results: Fondaparinux significantly reduced the inci- 
dence of VTE by day 11 (182 [6.8%] of 2682 patients) 
compared with enoxaparin (371 [13.7%] of 2703 pa- 
tients), with a common odds reduction of 55.2% (95% 
confidence interval 45.8% to 63.1%; P < 0.001); this ben- 
eficial effect was consistent across all types of surgery 
and all subgroups. Although major bleeding occurred 
more frequently in the fondaparinux-treated group 
(P = 0.008), the incidence of clinically relevant bleeding 
(leading to death or reoperation or occurring in a critical 
organ) did not differ between groups. 

Conclusion: In patients undergoing orthopedic surgery, 
2.5 mg of fondaparinux sodium once daily, starting 6 
hours postoperatively, showed a major benefit over en- 
oxaparin, achieving an overall risk reduction of VTE 
greater than 50% without increasing the risk of clinically 
relevant bleeding. 


Conclusions 


Regulatory work for conducting a trial on a new device or 
drug starts with submitting the research protocol to the re- 
search ethics board of all participating centers and to the 
federal agencies. 

Following approval, adverse events and adverse drug re- 
actions should be reported and annual reports should be 
submitted to both these authorities. The sponsor is respon- 
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Examples from the Literature: Example of a Study of a 
New Device 

Abstract 

Source: Marulanda GA, Ragland PS, Seyler TM, Mont MA. 
Reductions in blood loss with use of a bipolar sealer for 
hemostasis in primary total knee arthroplasty. Surg 
Technol Int 2005; 14:281-6.” 

Fifty primary total knee arthroplasties were performed 
in a prospective, randomized study comparing the use 
of a bipolar sealer device versus conventional electro- 
cautery as the method of hemostasis. Both cohorts 
were evaluated for intraoperative blood loss, transfusion 
rate, postoperative drainage, hemoglobin levels, and 
Knee Society scores. A significant reduction in post- 
operative and total blood loss was found (P = 0.05 and 
P = 0.02, respectively), as well as an absence of tissue 
charring and smoke production in the bipolar sealer 
group. No difference in knee scores was found between 
both cohorts. These results suggest that use of this bipo- 
lar sealing device is at least as effective as standard cau- 
tery devices and can reduce blood loss, tissue damage, 
and smoke production in total knee arthroplasties with- 
out affecting the results. 


sible for monitoring the trial. Once a drug or device has 
proven safe and effective, approval by federal authorities 
is needed again to market it. Without the ethics boards 
and federal agencies examining the safety and integrity 
of a trial during all of its stages, doing interventional 
research in a responsible manner is not possible. 
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Strategies for Obtaining Research Funding 


“The very best financial presentation is one that’s well thought out and anticipates any 
questions... answering them in advance.” 


Summary 


In this chapter, different strategies are discussed for ob- 
taining funding for research initiatives. Several different 
funding sources exist including government funding, foun- 


Introduction 


Adequate funding is a requirement for the success of any 
research project or clinical trial. Research funding allows 
the nominated principal investigator to be able to cover 
the costs of staffing a central methods center to coordinate 
the clinical trial efficiently and successfully. Providing 
funding for clinical centers to help offset their costs enables 
them to successfully enroll and follow patients in the study. 


Government Funding 


Most countries have large funding agencies for health re- 
search organized and funded by the government such as 
the National Institutes of Health (NIH) in the United States 
and the Canadian Institutes of Health Research (CIHR) in 
Canada. The CIHR supports the work of up to 10,000 re- 
searchers and trainees in universities, teaching hospitals, 
and research institutes across Canada and allocates 94 
cents of every dollar directly to fund Canadian health 
researchers.! 

Searching the Internet is a good place to start research- 
ing government funding agencies in your country. The ad- 
ministrative staff in your institution’s grants and contracts 
office will likely be resourceful and knowledgeable on gov- 
ernment grants. Once you have identified the appropriate 
agency, it is important to determine the correct funding to 
apply for, as there are typically several different funding 
programs. Government agencies also have calls for special 
initiatives, so it is a good idea to check the Internet fre- 
quently for calls for applications for special initiatives. 

The application process to government agencies is often 
complex and requires the completion of multiple forms. It 
is important to determine the deadlines for submission 
and the requirements for submission of a protocol, and to 
leave sufficient time to complete the proposal. If an incom- 
plete application is submitted to a government funding 


— Arthur Helps 


dation or association funding, local funding, and funding 
from industry. A list of tips to secure adequate funding is 
also provided. 


There are several different avenues to pursue for obtaining 
funding for clinical research trials. The purpose of this 
chapter is to provide an overview of different funding 
sources including government funding, foundation or as- 
sociation funding, local funding, industry funding, and in- 
dustry-initiated trials. In addition, a list of tips is supplied 
for applying for funding. 


agency, the application may be returned. Talking to fellow 
researchers and colleagues about their previous experi- 
ence with government funding agencies is also helpful, 
as the application process can be lengthy and difficult. In 
addition, mentorship can be vital in preparing a successful 
application. 

Government grants typically provide substantial fund- 
ing for research. Grants can be in the million dollar ranges, 
enough to fund a large, multicenter clinical trial fully. Many 
government granting agencies require the completion of a 
successful pilot study to prove that the nominated princi- 
pal investigator is able to successfully enroll patients into 
the trial, follow the research protocol, and have few proto- 
col violations. The purpose of a pilot study is typically to 
prove feasibility of the trial, as the government agency 
does not want to waste scarce resources on a trial that 
may not be a success. Including co-investigators with a 
proven track record with the funding agency is often help- 
ful in proving to the agency that the trial will be completed 
successfully. In addition, many granting agencies have 
partnership programs with industry sponsors (pharma- 
ceutical companies), where the industry provides a por- 
tion of the funding and the government agency provides 
the other portion of the funding. 
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Jargon Simplified: Pilot Study 

A pilot study is the initial study examining a new 
method or treatment, often to determine the feasibility 
of a larger clinical trial. 


Many government funding agencies, such as the CIHR use a 
peer-review system to ensure that the funding process is 
fair and open, that taxpayers’ money is spent wisely, and 
only the best and brightest researchers are funded.' Gov- 
ernment agencies receive numerous applications for fund- 
ing annually; each application is evaluated by a peer-re- 
view committee, who review the proposal for strengths 


Foundation or Association Funding 


Foundations and professional organizations often have 
funding set aside for clinical research grants. Examples in- 
clude the Orthopaedic Trauma Association (OTA), the 
Orthopaedic Research and Education Foundation (OREF), 
and Physicians Services Incorporated (PSI). Many founda- 
tions or associations provide pilot or start-up funding for 
clinical research. For example, a foundation may have a 
funding limit of $25,000 per application. 

Searching the Internet and contacting different founda- 
tions or associations directly are a good means of identify- 
ing potential funding opportunities. Web sites, such as the 
community of science (COS; www.cos.com), enable re- 
searchers to search for funding in their subspecialty. One 
can find funding with the COS by searching the world's 
most comprehensive funding resource, with more than 
22,000 records representing nearly 400,000 opportunities, 


Local Funding 


Many hospitals and universities also have funding support 
for research trials initiated at their institutions. Some com- 
petitions are for all medical specialties, whereas others are 
for specific subspecialties, such as orthopaedic surgery. 
Your institutions Web site will likely have information on 
local grant programs. In addition, newsletters, memos, 
and e-mails can also provide valuable information on local 
funding initiatives. Local funding is typically small, and 
meant for pilot studies or start-up funding for a trial, which 
can strengthen an application to a larger government 
agency. Rarely does local funding provide enough money 


Industry Funding 


Many orthopaedic device or pharmaceutical companies 
have funding set aside to sponsor clinical research trials. 
Sometimes, they are set up as a formal grant competition; 
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and weaknesses. Committees often use a process of con- 
sensus and a rating scale to ensure that the grants of the 
highest quality are funded. Protocols are often not funded 
on the first submission, so do not be discouraged if your 
grant is not funded immediately. The grant reviewers often 
provide written feedback, which are necessary to incorpo- 
rate into future submissions. 


Jargon Simplified: Peer-Review System 

Peer review is a process used for checking the work per- 
formed by one's equals (peers) to ensure it meets speci- 
fic criteria. 


worth over $33 billion.” Information on the COS Web site is 
updated daily and all information verified with sponsor, 
edited for consistency, and optimized for accurate search- 
ing.” Each COS funding opportunities record includes key 
information including the title, abstract, and amount; the 
sponsor and contact information; the deadline; eligibility 
requirements; and a community of science keyword (a 
standardized list, applied consistently through all records). 

Many institutions also have a summary list of research 
funding programs and some institutions may have a list 
of potential funding agencies. In addition, some countries, 
such as Canada, have directories published on foundations 
that may provide funding for clinical research. For exam- 
ple, Imagine Canada maintains a directory of foundations 
and corporations in Canada that can provide grants for 
research and to organizations.’ 


to fund a large clinical trial fully. The requirements for sub- 
mission are usually restricted to faculty members at the lo- 
cal institution. The application process is typically less rig- 
orous than a government agency application. 


Jargon Simplified: Start-up Funding 

Start-up funding is a small amount of funding that al- 
lows the nominated principal investigator to begin a 
clinical trial, while applying for sufficient funding to 
complete the trial. 


other times, they are less formal and an investigator can 
send a letter to a contact at a company and work with 
the company to develop a protocol to fund. The amount 
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of industry funding available can vary from small amounts 
for pilot funding to millions of dollars to fund a large clin- 
ical trial. 

Searching the Internet, talking to company sales repre- 
sentatives, or contacting companies directly are good 
methods of identifying research grants offered by indus- 


Industry-Initiated Trials 


Another means of participating in clinical research is to 
participate in an industry-initiated trial. An industry-in- 
itiated trial is a study that is developed, sponsored, and co- 
ordinated by a pharmaceutical or device company. This 
section will focus on reviewing an offer to participate in 
an industry-initiated clinical trial. Participating in clinical 
trials requires a substantial commitment of both time 
and effort and participation that often continues for 
months or even years in orthopaedic trauma trials.’ There 
can be financial incentives to participating in industry- 
sponsored clinical trials; however, other incentives include 
a chance to collaborate with other clinical investigators 
and opportunities to improve knowledge about the dis- 
ease and treatment being investigated.* Another advan- 
tage, which may be offered in some trials, is the exposure 
to new investigative techniques or access to special equip- 
ment or facilities.* 

The scientific, practical, and financial implications need 
to be considered before agreeing to participate in a clinical 
trial. The first item to assess is the study question and the 
study methodology. It is important to ensure that it is a re- 
levant question and that the study methodology is sound 
and will meet the goals of the trial. The eligibility criteria 
must be carefully assessed to ensure that your clinical 
site has sufficient eligible patients to be recruited for a clin- 
ical study. It is often necessary to perform a prospective 
screening study of potentially eligible patients over a 4- 
to 8-week period to provide an accurate estimate of re- 
cruitment numbers. 

It is important to assess the impact of the trial on your 
patients, as your primary obligation is to protect the wel- 
fare your patients. It is important to consider if the pa- 


Tips for Successful Grant Applicants 


Find out about the institutional support that is available to 
you: The first step in gaining funding for your research 
effort is to make yourself completely aware of all of the 
options, in the form of financial support, that are avail- 
able to you before you decide which grants you are going 
to submit applications for. 

Seek mentoring: The advice and supervision of your se- 
nior colleagues, as you may already know, is an invalu- 
able source of information with which your grant appli- 


try. This can be time consuming, but industry can provide 
substantial funding for clinical research. With industry 
funding, it is important to determine what the sponsor’s 
role is going to be. In addition, it is important to determine 
what access the company will have to the data and deter- 
mine any publishing limitations. 


tients will be required to have any investigations or proce- 
dures that are not part of standard care, and whether these 
will be painful and will possibly put the patients at risk. 
You need to estimate how much time the patient will de- 
vote to the study and how much, if any, compensation 
the patient may receive. 

The time required for the overall coordination, the pa- 
tient enrollment and follow-up, and the data collection 
also needs to be considered carefully. It is necessary to 
have an experienced clinical research coordinator carefully 
review the protocol and case report forms to provide an ac- 
curate estimate on the time and resources required to par- 
ticipate successfully in an industry-sponsored trial. Tasks 
often take longer than anticipated, so it is often good to 
add in a little extra time to account for the unanticipated. 
It is also important to know whether the study will be pub- 
lished and what the authorship policies are. After carefully 
weighing these items, orthopaedic surgeons can make an 
informed decision to participate in a clinical trial. 


Key Concepts: Items to Consider before Agreeing to 
Participate in a Clinical Trial 

e Assess the study question and the study methodology 
e Review the eligibility criteria 

e Assess the impact of the trial on your patients 

e Determine the time required for the overall coordina- 
tion, the patient enrollment and follow-up, and the 
data collection 

Have your clinical research coordinator carefully re- 
view the protocol and case report forms to provide 
an accurate estimate on the time and resources 

Ask about authorship policies 


cation can greatly benefit. Having a mentor, or mentors, 
with prior experience submitting successful grant appli- 
cations will also make the process a smoother and more 
effective one. It is often beneficial to have multiple men- 
tors with different areas of expertise. For instance, hav- 
ing an orthopaedic surgeon who can provide surgical 
mentorship and having an experienced epidemiologist 
or clinical trialist who can provide mentorship on health 
research methodology is an ideal. 
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e Have peers review your grant application prior to submis- 
sion: It is important to have your mentors and appropri- 
ate colleagues review your grant application. In larger 
trials where a steering committee exists, the steering 
committee should be involved in the development of 
the protocol and in the review of the final proposal. 
Generate preliminary data: To ensure feasibility of a large 
study, it is a good idea to generate preliminary data 
through the use of a pilot study, such as a prospective 
screening study. In a prospective screening study, the 
participating clinical centers apply the inclusion and ex- 
clusion criteria to all patients who present to their insti- 
tution over a specified period to determine the number 
of eligible participants. A retrospective screening study 
can also be conducting by reviewing hospital charts 
and medical records. Having already established the fea- 
sibility of your study is another step in the process of 
creating a successful grant application. 

e Enlist collaborators and include letters of support: If you 
have designed a multicenter initiative, it is important 
to ensure that you have a sufficient number of centers 
available to enroll patients into your trial. It is important 
to start early in recruiting centers, as this takes a sub- 
stantial amount of time. In addition, it is best to have 
few extra sites, as some sites may actually recruit a pa- 
tient. It is also important to have them each provide a de- 
tailed letter of support and a curriculum vitae, and in- 
clude them as co-investigators on your grant application. 

e Look at successful proposals of colleagues in your field: If 
the anatomy of your grant application resembles that 
of a successful application of a colleague in your field, 
your grant application will have a better chance of being 
successful. Try to build upon what others have already 
done. 

e Have an open dialogue with your granting body: The best 
place to go for help with your application is to those who 
are going to be reviewing it. It is important to reach peo- 
ple who want to help you. 
Give yourself plenty of time to prepare your proposal: To a 
grant review board rushed applications will appear in 
stark contrast to those that have been carefully and 
methodically prepared. It is also important for the plan- 
ning and execution of your study to be comfortable in 
the timeframe that you are working in and not to rush 
things. 

¢ Do your homework: Make it a priority to familiarize your- 
self with all of the literature, questions, and controver- 
sies in your area. A meta-analysis of all of the publica- 
tions on your topic will help to clearly define the purpose 
of the study and provide a background framework to 
educate yourself and your audience on the area of inter- 
est. In addition, conducting a survey of surgeons to 
demonstrate the controversy in your area can also 
strengthen an application. 
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Jargon Simplified: Meta-analysis 

A meta-analysis is “an overview that incorporates a 
quantitative strategy for combining the results of several 
studies into a single pooled or summary estimate.”> 


e Place your work in perspective: It is important to cite 
others who have contributed something significant in 
your area of interest, and to make sure you cite all of 
the sides in an area of controversy. It is important to pro- 
vide a thorough search of the literature and to provide a 
detailed list of relevant references. 

e Structure your application: Making priorities clear and 
providing a timeline for your trial will give your study 
the essential organization it needs to be funded and be 
successfully executed on the part of the nominated prin- 
cipal investigator. It is necessary to follow the require- 
ments provided by the funding agency. 

e Discuss potential problems: To deal with the inevitable 
problems that will arise during the course of conducting 
research, it is invaluable to have alternate strategies to 
accomplish your goals. Granting bodies will feel more 
confident in a study succeeding when there are avenues 
in place to deal with problems that may arise. 

Carefully consider your funding needs: A detailed budget 

is a requirement of every research grant, as reviewers 

will judge your competence based on the accuracy of 
your budgetary needs compared with the funding you 
are applying for. 

e Have your proposal carefully analyzed: It is of the utmost 
importance that your proposal be proofread by not only 
you and your research team, but also by others. Typos 
and formatting errors will quickly make you look incom- 
petent to reviewers who are potentially going to be in- 
vesting money in your effort. In addition, critiquing 
your own work will better enable you to react to others’ 
criticisms of your effort, it is always important to be pre- 
pared. 


Key Concepts: Tips for a Successful Grant Application 

e Find out about the institutional support that is avail- 
able to you 

e Seek mentoring 

e Have peers review your grant application prior to sub- 
mission 

e Generate preliminary data 

e Enlist collaborators and include letters of support 

e Look at successful proposals of colleagues in your field 

e Have an open dialogue with your granting body 

e Give yourself plenty of time to prepare your proposal 

e Do your homework 

e Place your work in perspective 

e Structure your application 

e Discuss potential problems 

e Carefully consider your funding needs 

e Have your proposal carefully analyzed 
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Conclusions 


In conclusion, there are multiple sources of funding avail- 
able for research initiatives. It is important to take the time 
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The Roles of the Research Team 


“Seventy percent of success in life is showing up.” 


Summary 


A qualified and experienced research team is crucial for the 
success of any research initiative. Details on the roles of the 
individuals who help to make a trial successful are pro- 
vided in this chapter. The roles of the central methods cen- 


Introduction 


The number of personnel required for the efficient organi- 
zation and coordination of a trial depends upon the num- 
ber and size of ongoing clinical trials. The larger the sample 
size and the more complex the study protocol is, generally 
the more staff required to coordinate the study. A multi- 
center initiative with a sample size of over 1000 patients 
will often require several research staff members to coordi- 
nate the trial at the central methods center, including one 
study coordinator, two research assistants, one data man- 
ager, one data analyst, and an administrative assistant.’ In 
addition, there will be a nominated principal investigator; 


Central Methods Center Personnel 


The roles of the central methods center investigators and 
staff are reviewed in this first section. The role of the central 
methods center is described in detail in the next chapter. 


The Nominated Principal Investigator 


The nominated principal investigator is responsible for the 
overall design and conduct of the trial. The nominated prin- 
cipal investigator is also responsible for finding sufficient 
funding for the trial, applying to funding agencies, locating 
sufficient clinical sites to enroll patients into the trial, and 
organizing the trial committees (described below). The no- 
minated principal investigator is usually a clinician who 
has substantial clinical and research experience. 


Co-principal Investigator(s) 


There can be several co-principal investigators in a large, 
multicenter clinical trial. The co-principal investigators 


— Woody Allen 


ter staff and of the individuals at the clinical centers are 
discussed, as well as the organization of several trial com- 
mittees. 


co-principal investigators; multiple co-investigators in- 
cluding a senior statistician, an economist, and clinical tri- 
alist or research methodology experts. In addition, the clin- 
ical centers participating in the trial staff will include a site 
principal investigator, site co-investigators, a clinical re- 
search coordinator or nurse, and an administrative assis- 
tant. Study committees will include a steering committee, 
a central outcomes adjudication committee, and a data 
safety and monitoring board. The purpose of this chapter 
is to discuss the roles of each of these individuals who 
are necessary for the successful execution of a clinical trial. 


can be experts in health research methodology, epidemiol- 
ogy, clinical trials, or in the surgical field of interest. Alter- 
natively, a co-principal investigator can be a junior re- 
searcher who is being mentored by the nominated princi- 
pal investigator. In addition, co-principal investigators can 
be responsible for a geographical region, such as having 
one co-principal investigator for Europe, one for Australia, 
with the nominated principal investigator being responsi- 
ble for North America. 


Co-investigators 


There can be several co-investigators, all with different 
roles. Co-investigators can be health research methodol- 
ogy experts, epidemiologist clinical trialist economists 
(described below), senior biostatisticians (described be- 
low), or experts in the medical or surgical field(s) of inter- 
est. The roles of the co-investigators can vary from acting 
as consultant, advisor or mentor, to being a member of 
the steering committee or the central outcomes adjudica- 
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tion committee (described below). It is important that all 
co-investigators are involved in the designing, organizing, 
and conducting of the clinical trial. 


Economist 


Many multicenter randomized controlled trials include an 
economic analysis component or substudy (refer to Chap- 
ter 13) as part of the trial protocol. Because economic ana- 
lyses can be very complex to plan, execute, and analyze, it is 
necessary to include an experienced economist in all 
phases of the study. The economist will be responsible 
for the overall conduct of the economic analysis compo- 
nent of the trial, and may be part of the steering committee. 


Senior Biostatistician 


The senior biostatistician is responsible for planning all 
statistical analyses, including the sample size calculation, 
the interim analyses, and the final analyses. The senior 
biostatistician is a crucial member of both the steering 
committee and the central outcomes adjudication com- 
mittee. Whereas the data analyst (described below) con- 
ducts the majority of the statistical analyses, the senior 
biostatistician oversees all of the analyses. As a side note, 
it is important to ensure that both the senior biostatistician 
and the data analyst are both blinded to the treatment 
groups for the duration of the trial. The senior biostatisti- 
cian should have a doctoral degree in biostatistics and 
have several years of statistical experience. 


Jargon Simplified: Blinding 

“The participant of interest is unaware of whether pa- 
tients have been assigned to the experimental or control 
group. Patients, clinicians, those monitoring outcomes, 
judicial assessors of outcomes, data analysis, and those 
writing the paper can all be blinded or masked.” 


Collaborators 


The role of the collaborator is similar to the role of the co- 
investigators, but with a lesser time commitment. Colla- 
borators can act as a consultant or an advisor; however, 
their contribution is usually substantially less than a co-in- 
vestigator. 


Study Coordinator 


The study coordinator is responsible for the organization 
and the coordination of the trial. The study coordinator 
communicates daily with the nominated principal investi- 
gator, providing updates on study recruitment, data sub- 
mission, upcoming meetings, and any problems experi- 
enced. The study coordinator is responsible for working 
with the data manager (described below) to design the 


case report forms, the database, and the randomization 
system. The study coordinator writes the manual of opera- 
tions, any standard operating procedures required for the 
trial, and the pocket protocol. The study coordinator is re- 
sponsible for the day-to-day coordination of the trial and is 
the first line of communication for the clinical sites. In ad- 
dition, the study coordinator hires and supervises the re- 
search assistant(s) and the administrative assistants, parti- 
cipates in all study meetings and conference calls, prepares 
the reports for the committee meetings, and provides 
training and audit visits to all clinical sites. The study coor- 
dinator should have a Masters of Science degree, specializ- 
ing in Health Research Methodology, Epidemiology, or an- 
other relevant field. The study coordinator should have at 
least 5 years of relevant research experience conducting 
clinical trials. 


Jargon Simplified: Pocket Protocol 

A pocket protocol is a smaller version of the protocol, 
summarizing only key aspects of the protocol, including 
inclusion and exclusion criteria, randomization instruc- 
tions, follow-up instructions, and contact information 
for relevant research personnel. It is meant to be used 
as a quick reference and to supplement the full protocol 
and other study material. 


Data Analyst 


The data analyst typically has a Masters degree in Statistics 
or Mathematics, with several years of relevant work ex- 
perience. The primary role of the data analyst is to prepare 
statistical reports of the data throughout the trial and at 
the end of the trial. The data analyst works closely with 
the study coordinator and the senior biostatistician. Exam- 
ples of data analyses includes conducting any interim ana- 
lysis, preparing reports for the data safety and monitoring 
board, preparing consensus tables for the central out- 
comes adjudication committee, and summarizing data 
for monthly newsletters. The data analyst should be cap- 
able of using one of the common statistical packages 
such as the Statistical Package for the Social Sciences 
(SPSS; SPSS Inc., Chicago, IL) or Statistical Analysis Soft- 
ware (SAS; SAS Institute, Cary, NC) to conduct the analyses. 
Throughout the trial, the data analyst spends approxi- 
mately 3 to 7 hours per week working on the trial, depend- 
ing on the number of reports required and the size of 
the trial. At the end of a trial, the data analyst typically 
devotes 6 to 9 months of fulltime work to conducting the 
analysis. 


Key Concepts: Common Statistical Packages 

e Statistical Package for the Social Sciences (SPSS; SPSS 
Inc., Chicago, IL) 

e Statistical Analysis Software (SAS; SAS Institute, Cary, 
NC) 

e MiniTab (MiniTab, Inc., State College, PA) 
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Database Manager 


The data manager is responsible for setting up and main- 
taining the randomization system in a randomized con- 
trolled trial, designing the case report forms with the study 
coordinator, and programming and maintaining the data- 
base. The data manager should have a Bachelor or Masters 
degree, with some relevant previous experience. The data 
manager can often manage multiple projects, with each 
study requiring a large amount of work upfront, with the 
workload decreasing throughout the trial. It is important 
to budget and allocate sufficient time and funding for the 
database manager. 


Financial Manager 


The financial manager works closely with the study coordi- 
nator to develop the study budget, prepare financial re- 
ports both for the study team and for sponsors, prepare fi- 
nancial projections throughout the trial, and ensure that 
the funding is spent within the sponsors’ guidelines. The fi- 
nancial manager is also responsible for setting up contracts 
with the clinical sites, to ensure that the funding for the 
clinical sites can be transferred. In addition, the financial 
manager can be responsible for setting up the payroll for 
the other members of the study team. The financial man- 
ager typically works on multiple trials, spending 5 to 7 


Clinical Sites 


The second section of this chapter will describe the roles of 
the clinical sites investigators and staff. The primary role of 
the clinical sites is to enroll patients in to the trial, com- 
plete follow-up visits, and submit all data to the methods 
center or project office (Fig. 35.1). 
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Fig. 35.1 Chart of key personnel at the clinical sites. 
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hours a month on the trial. When hiring a financial man- 
ager, a successful candidate should have a Bachelor degree 
in Commerce, Finance, Economics, or Accounting. Having 
previous experience working with clinical research is an 
asset. In smaller trial centers, the study coordinator is 
responsible for the role of the financial manager. 


Research Assistant(s) 


Large multicenter clinical trials often require several re- 
search assistants to help the study coordinator that the 
central methods center. The research assistants are re- 
sponsible for data validation, contacting clinical sites to 
discuss data queries and overdue follow-up visits, prepar- 
ing adjudication packages, preparing binders of case report 
forms, and taking minutes at meetings. Research assistants 
should have a Bachelor of Science degree and previous 
research experience is an asset. 


Administrative Assistant(s) 


The administrative assistant(s) also work closely with the 
study coordinator. Their tasks typically include scheduling 
meetings and conference calls, filing, mailings, issuing 
payments to clinical centers, photocopying, and typing of 
study material. The administrative assistants should have 
experience as a medical secretary. 


Site Principal Investigators and Site Investigators 


Each participating clinical site often has more than one 
surgeon or clinician enrolling patients into a multicenter 
clinical trial. In this situation, one investigator from each 
site is designated as the site principal investigator. The 
site principal investigator serves as the primary contact 
for the central methods center. The site principal investiga- 
tor is responsible for the conduct of the clinical trial at his 
or her center, and ensuring that the protocol is followed, all 
data are accurate and complete, that good clinical practice 
is followed, resolving any problems that the site may en- 
counter, and communicating regularly with the central 
methods center. The site principal investigator is also re- 
sponsible for attending all investigator meetings and parti- 
cipating in any conference calls regarding the trial. The 
principal site investigator is responsible for presenting 
the study protocol to his or her colleagues and getting sev- 
eral site investigators to participate in the trial. In addition, 
the principal site investigator is responsible for hiring qua- 
lified research staff to coordinate the patient enrollment 
and follow-up at the clinical site. In other words, the site 
principal investigator is responsible for the overall conduct 
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Table 35.1 Responsibilities of a Site Principal Investigator 


e To protect the rights, safety, and welfare of individuals who have 
agreed to participate in the trial 

e To ensure informed consent is obtained for all patients 
participating in the trial 

e To ensure that the investigation is conducted according to the 
protocol, case report forms, and applicable regulations 

e To ensure drugs or medical devices are stored appropriately and 
are only used for the purposes of the study 

e Accountable for all investigational products 

e To identify staff who are appropriately trained for their role 
in the clinical trial and provide documentation of their 
qualifications 

e To ensure correct documentation in case report forms and 
patient hospital records to enable source data verification 

e To be available for monitoring visits and to permit the trial 
monitors access to source data 

e To store the study data for the appropriate time required by 
regulations for that individual study 


e To report adverse events according to the regulatory guidelines 
for the individual study 


of the clinical trial at his or her clinical site. The responsibil- 
ities of the investigator are outlined in detail in the ICH GCP 
guidelines, and the main elements are summarized in 
Table 35.1. 

The site investigators are responsible for enrolling pa- 
tients into the trial, following the study protocol, and fol- 
lowing patients, as per the study protocol. The site investi- 
gators should communicate weekly with the site principal 
investigator to discuss any problems they experience fol- 
lowing the study protocol, enrolling research participants, 
or following research participants. The site investigators 
may attend any investigator meetings or participate in 
any conference calls on behalf of the site principal investi- 
gator. 

The site principal investigator is usually a surgeon or 
clinician, with experience in conducting clinical research. 
It is necessary for the site principal investigator to have suf- 
ficient time to dedicate to the clinical trial, to ensure that 
the trial is conducted well at the clinical center. The site in- 
vestigators are typically surgeons or clinicians, with less 
research experience, and a high clinical caseload, who 
can effectively contribute multiple patients into a trial. 
The role of the site investigator takes substantially less 
time than the role of site principal investigator; however, 


Trial Committees 


The complexity of a trial with multiple participating clini- 
cal centers and hundreds of participating surgeons re- 
quires key organizing committees to overlook the conduct 
of the trial, to assure patient safety, and limit bias in out- 


the site investigators need to be able to dedicate enough 
time to the trial to ensure that the patient has properly 
consented, the case report forms are completed accurately, 
and the follow-up appointments and requirements follow 
the study protocol. 


Clinical Research Coordinator 


Each site should have a dedicated clinical research coordi- 
nator to manage the day-to-day trial activities at the clin- 
ical sites. These include regular communication with the 
local ethics committee, weekly communication with the 
central methods center, obtaining patient consent, assist- 
ing with the patient randomization, completing case re- 
port forms, and scheduling patient follow-up appoint- 
ments. In addition, the clinical research coordinator is re- 
sponsible for submitting data to the central methods cen- 
ter, responding to any queries on the data, and participat- 
ing in any meetings or conference calls. The clinical re- 
search coordinator and the site principal investigator 
work closely together to ensure compliance, data quality, 
and effective communication with the central methods 
center. 

The clinical research coordinator should have a Bache- 
lors degree in a relevant field including nursing, health 
sciences, or kinesiology. A Masters degree is an asset for 
a clinical research coordinator. In addition, the clinical re- 
search coordinator should have some formal training in 
Health Research Methodology or Clinical Trials Manage- 
ment or equivalent work experience. Because many clini- 
cal research coordinators work independently, a strong 
knowledge base is a necessity. Previous experience is also 
an asset for a clinical research coordinator. 


Administrative Assistant 


Many clinical centers have an administrative assistant who 
is available to help both the site principal investigator and 
the clinical research coordinator. The duties of the admin- 
istrative assistant may include scheduling meetings, look- 
ing after the finances, scheduling appointments, submit- 
ting data to the central methods center, and filing. The ad- 
ministrative assistant should have a background as a med- 
ical administrative assistant, with some experience and a 
good understanding of medical terminology. 


comes assessment.! The committees typically include a 
steering committee, a central outcomes adjudication com- 
mittee, and a data safety and monitoring board (Fig. 35.2). 
Each committee is described in detail in this section. 
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Fig. 35.2 Chart of the hierarchy of study committees. 


Steering Committee 


The steering committee is responsible for the overall de- 
sign and conduct of the trial and the members of this com- 
mittee may or may not be direct participants of the pro- 
posed clinical trial.' Often the steering committee consists 
of the nominated principal investigator, a senior biostatis- 
tician, a trials methodologist, and other key individuals 
deemed important to the design and conduct of the study. 


Key Concepts: Members of the Steering Committee 

e Nominated principal investigator 

e Co-principal investigators 

e Senior biostatistician 

e Trials methodologist 

e Orthopaedic surgeons 

e Other key individuals deemed important to the design 
and conduct of the study 


The steering committee acts as an advisory committee and 
discusses and approves amendments to the protocol and 
case report forms. The steering committee communicates 
with the central methods center, the data safety and mon- 
itoring board, the central outcomes adjudication commit- 
tee, and the site principal investigators on a regular basis.' 
At the completion of the trial, the steering committee 
maintains responsibility for the final data analysis and 
manuscript preparation on behalf of all study investigators 
and participating sites.! 


Key Concepts: Responsibilities of the Steering 

Committee 

e Responsible for the overall design and conduct of the 
trial 

e Acts as an advisory committee and discusses and ap- 
proves amendments to the protocol and case report 
forms 

e Communicates with the central methods center, the 
data safety and monitoring board, the central out- 
comes adjudication committee, and the site principal 
investigators on a regular basis 

e At the completion of the trial, the Steering Committee 
maintains responsibility for the final data analysis and 
manuscript preparation on behalf of all study investi- 
gators and participating sites 
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Data Safety and Monitoring Board 


All clinical trials require safety monitoring, but not all trials 
require monitoring by a formal committee external to the 
trial investigators. Data safety and monitoring boards have 
generally been established for large, randomized multi- 
center studies that evaluate interventions intended to pro- 
long life or reduce risk of a major adverse health outcome.! 

The data safety and monitoring board acts in an advisory 
capacity to a national funding body such as the National In- 
stitutes of Health (NIH) or Canadian Institutes of Health 
Research (CIHR) to monitor patient safety and progress.! 
The data safety and monitoring board is often composed 
of five to six individuals who are completely independent 
of the study investigators. The data safety and monitoring 
board members have no financial, scientific, or other con- 
flict of interest with the trial.' The data safety and monitor- 
ing board members include experts or representatives 
who meet some or all of the following criteria: (1) relevant 
clinical or surgical expertise, (2) clinical trial methodology 
experience, (3) biostatistics expertise, or (4) experience re- 
lated to medical ethics.' One individual is selected to serve 
as the data safety and monitoring board chairperson, who 
is responsible for overseeing the meetings and is the pri- 
mary contact person for the data safety and monitoring 
board.! 


Key Concepts: Members of the Data Safety and 
Monitoring Board 

e Experts with relevant clinical or surgical expertise 
e Clinical trial methodology experience 

e Biostatistics expertise 

e Experience related to medical ethics 


Key Concepts: Responsibilities of the Data Safety and 

Monitoring Board 

“e Review the research protocol, informed consent docu- 
ments and plans for data safety and monitoring 

Evaluate the progress of the trial, including periodic 

assessments of data quality and timeliness, participant 

recruitment, accrual and retention, participant risk 

versus benefit, performance of the trial site, and other 

factors that can affect study outcome 

Consider factors external to the study when relevant 

information becomes available, such as scientific or 

therapeutic developments that may have an impact 

on the safety of the participants or the ethics of the 

trial 

Review study performance, make recommendations 

and assist in the resolution of problems reported by 

the Principal Investigator 

Protect the safety of the study participants 

Report on the safety and progress of the trial 

Make recommendations to the Funding Agency, the 

Principal Investigator, and, if required, to the FDA con- 

cerning continuation, termination or other modifica- 
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tions of the trial based on the observed beneficial or 
adverse effects of the treatment under study 

e If appropriate, conducting an interim analysis of effi- 
cacy in accordance with stopping rules which are 
clearly defined in advance of data analysis and have 
the approval of the Data Safety and Monitoring Board 

e Ensure the confidentiality of the trial data and the 
results of monitoring”! 


Individuals invited to serve on the data safety and monitor- 
ing board as either voting or nonvoting members must dis- 
close any potential conflicts of interest, whether real or 
perceived, to the principal investigator and the national 
funding agency (if applicable)! 

The overall role of the data safety and monitoring board 
encompasses the monitoring of safety and effectiveness.’ A 
fundamental responsibility of every data safety and moni- 
toring board is to make recommendations to the steering 
committee concerning the continuation of a study.’ Most 
frequently, a data safety and monitoring board’s recom- 
mendation after an interim review is for the study to con- 
tinue as designed. Other recommendations that might be 
made include study termination, study continuation with 
major or minor modifications, or temporary suspension 
of enrollment and/or study intervention until some uncer- 
tainty is resolved.‘ The role of the data safety and monitor- 
ing board is discussed in greater detail in Chapter 37. 


The Central Outcomes Adjudication Committees 


The primary role of the central outcomes adjudication 
committee is to review important end points reported by 
trial investigators to determine whether they meet proto- 
col-specified criteria. Members of the central outcomes 
adjudication committee may request radiographs, chart 
notes, operative reports, and other pertinent material to 
guide their decision-making about a defined outcome. All 
attempts should be made to blind the committee to treat- 
ment allocation. This includes careful masking of all clini- 
cal notes, case report forms, radiographs, and reports. 
Such committees are most desirable when the assessment 
of outcomes requires an element of judgment or subjec 
tively (i.e., fracture healing) or when the intervention can- 
not be blinded.! 


Jargon Simplified: Adjudication 

Adjudication refers to the review of important endpoints 
reported by trial investigators to determine whether 
they meet protocol-specified criteria. In addition, pa- 
tient eligibility and protocol deviations can also be adju- 
dicated. 


A blinded, central adjudication of outcomes can be used in 
randomized controlled trials as a way of reducing bias and 
random error in determining outcome events. This process 
may be especially important in clinical trials when the in- 


tervention cannot be blinded, such as in many surgical 
trials, or when the diagnosis of the primary outcome has 
low observer agreement. In addition to determining out- 
comes, central adjudication has also been used to assess 
eligibility of patients, protocol violations, and co-interven- 
tions. 


Jargon Simplified: Bias 

Bias is any factor or process that acts to deviate the re- 
sults or conclusions of the study away from the truth, 
causing either an exaggeration or an underestimation 
of the effects of an intervention. 


Jargon Simplified: Random Error 
Random error is related to variation of measurement 
due to chance. 


There are many factors the steering committee considers 
when deciding whether to use a central adjudication pro- 
cess for determining outcomes in a trial. Most importantly, 
the steering committee must weigh the expected benefit 
of adjudication for accurate determination of outcomes 
against the substantial investment of resources involved 
and practicality undergoing this process. the steering com- 
mittee or investigators must also consider the potential for 
the adjudication process itself to bias the results of the trial. 

To centrally adjudicate outcomes for a trial, it takes a 
considerable amount of administrative and expert time 
to collect the relevant information, prepare the informa- 
tion for review, review each case, and participate in con- 
sensus meetings. There may also be challenges involving 
the availability, validity, or usefulness of some documenta- 
tion sources. If the attending surgeon is also required to 
make a judgment about whether an outcome occurred, it 
may be possible to compare the committee’s judgment 
with the attending surgeon’s to determine whether central 
adjudication is necessary. In many large multicenter ran- 
domized controlled trials, a fulltime research assistant at 
the central methods center is dedicated to collecting and 
processing the information for adjudication and then pre- 
paring the adjudication packages. The study coordinator 
often will oversee the process and check over some of 
the packages. Therefore, a considerable amount of time 
and expense is devoted to the adjudication process. 

Once the steering committee or investigators decide to 
centrally adjudicate outcomes, they must make several 
other decisions about the process, including who will the 
adjudicators be, what material must be evaluated, which 
judgments must be made, how to train the adjudicators, 
establishing a set of standardized decision rules, what 
will be the committee size, will decisions be made in pairs 
or by the full committee (if in pairs, whether to assign cases 
randomly), how to reach consensus on a decision, and how 
to monitor the accuracy of the process. 

Specific adjudication case report forms are designed by 
the study coordinator and the data manager for the adjudi- 
cators to complete for each adjudication package. The adju- 
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dication forms are then submitted to the central methods 
center and are entered into the trial’s database. After the 
data has been validated, the data analyst conducts the con- 
sensus analysis to determine which cases the committee 
agreed and disagreed on. Usually, a consensus meeting or 
conference call is held to resolve any disagreements. 

The agreement among adjudicators on a particular case 
can be affected by the number of decisions necessary, the 
number of choices for the outcome, and the complexity 
of the judgments. Disagreements can result from encoun- 
tering a case for which there is not a relevant decision rule, 
forgetting a rule, not having enough information, making 
an error, or the outcome is difficult to determine given 
all of the relevant information. 

The central outcome-adjudication committee mem- 
bers review important end points to determine whether 


Conclusions 


It is important to have an organized and qualified research 
team to conduct all clinical trials and research initiatives. 
Details of some of the research personnel involved in 
the successful completion of a clinical trial have been pro- 
vided in this chapter, and the role of each trial committee 
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they meet protocol-specified criteria. The adjudication 
packages can consist of radiographs, chart notes, operative 
reports, and other pertinent material to guide their deci- 
sion-making about a defined outcome. All attempts should 
be made to blind the committee to treatment allocation. 
This includes careful masking of all X-rays and reports. 
Such committees are most desirable when the assessment 
of outcomes requires an element of judgment or subjectiv- 
ity (i.e., fracture healing) or when the intervention cannot 
be blinded. Adjudication is described further in Chapter 38. 


Key Concepts: Role of the Central Outcomes 
Adjudication Committee 

Ultimately, the role of the central outcomes adjudication 
committee is to limit bias in the outcomes’ assessment in 
a clinical trial. 


(steering committee, data safety and monitoring board, 
and the adjudication committee) have been described. 
The trial’s size, complexity, and design help to determine 
the role and the magnitude of the central methods center 
and the trial committees. 


N 


. Gordis L. Epidemiology. Philadelphia, PA:W.B. Saunders Co;1996 
3. International Conference on Harmonization of Technical Require- 
ments for Registration of Pharmaceuticals for Human Use (ICH): 
Guidance for Industry: Good Clinical Practice: Consolidated Guidance 
(ICH-E6). 1996. Available at: http://www.ich.org/LOB/media/ 
MEDIA482.pdf. Accessed October 9, 2004 
4. Center for Biologics Evaluation and Research. Guidance for Clinical 
Trial Sponsors: On the establishment and operation of clinical trial 
data monitoring committees. Rockville, MD: Center for Biologics Eva- 
luation and Research, Food and Drug Administration;2001. 
. Bhandari M, Tornetta P III, Guyatt GH. Glossary of evidence-based 
orthopaedic terminology. Clin Orthop Relat Res 2003; 413:158-163 


un 


221 


222 


www.urdukutabkhanapk.blogspot.com 


36 


The Role of a Central Methods Center 


“Every path to a new understanding begins in confusion.” 


Summary 


The central methods center is vital to the success of any 
clinical trial or research initiative. The role and size of the 
central methods center will depend on the study design, 


Introduction 


The foundation of any well-conducted clinical trial is a cen- 
tral methods center, also referred to as the central coordi- 
nating center or project office. A central methods center is 
a requirement for conducting any large multicenter surgi- 
cal trials. All the day-to-day activities including protocol 
development, centralized randomization, data manage- 
ment, coordination of the trial committees, and overall 
management of the clinical centers occur at this site. The 
central methods center can be a private contract research 
organization (CRO) that monitors the trial for a group of 


The Necessity of a Central Methods Center 


In all trials where rigorous methodology is followed, cen- 
tral methods center support is vital. The size of the central 
methods center and the amount of staff required depend 
greatly on the complexity of the trial protocol, the study 
design, the sample size, the number of participating clini- 
cal centers, and timeline for trial completion. Even a small 
single center randomized controlled trial with a sample 
size of 50 patients, will need some support with concealed 
randomization, data validation, and data analysis. Hiring a 
single study coordinator or research assistant can help to 
facilitate the completion of a small trial. It is also a good 
idea to consult with experts in research methodology, da- 
tabase design, and statistics throughout all phases of a clin- 
ical trial. If there is not sufficient qualified research person- 
nel (i.e., study coordinator, research assistant, data ana- 
lyst), the quality of the trial is likely to be reduced. 


Key Concepts: Central Methods Center 

Clinical trials, regardless of their size, benefit from hav- 
ing a central methods center and the appropriate, quali- 
fied research staff to administer the trial. 


— Mason Cooley 


the complexity of the trial, and the size of the trial, which 
are the focus of this chapter. 


clinical investigators. Alternatively, it can be an academic 
institution, directed by the nominated principal investiga- 
tor and staff members who are employed through a uni- 
versity or hospital. The roles of key research personnel 
who staff the central methods center have been described 
in the previous chapter. This chapter will focus on when it 
is necessary to have a central methods center, the respon- 
sibilities of the central methods center, and how its tasks 
are organized. 


There are advantages and disadvantages to having a cen- 
tral methods center coordinate a multicenter clinical trial. 
Disadvantages include the high cost of maintaining a cen- 
tral methods center, difficulty hiring qualified research 
personnel, and problems finding appropriate office space. 
The advantages to having a central methods center include 
better research methodology and rigor and having the trial 
completed in a timely manner. These advantages strongly 
outweigh the disadvantages to having a central methods 
center. To help maintain the longevity of a central methods 
center, it is ideal to have multiple funding projects being 
coordinated at one time through the same central methods 
center. In some cases, several successful nominated princi- 
pal investigators may combine their resources on multiple 
trials to maintain the central methods center. 


Key Concepts: Advantages and Disadvantages to 
Having a Central Methods Center 

Advantages 

e Better research methodology and rigor 

e Study is completed in a timely manner 
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Disadvantages 
e High operating cost (i.e., salaries, supplies, etc.) 


The Functions of a Central Methods Center 


Trial Organization 


An advantage to conducting a multicenter randomized 
controlled trial is that it increases the number of patients 
available to recruit for a trial and, as a result, decreases 
the amount of time required to complete an accrual target. 
Another advantage of a multicenter trial is that it allows 
orthopaedic surgeons with similar research interests to 
meet, exchange ideas, and pursue further collaborations 
with the aim of improving surgical care to trauma patients. 
Large multicenter trials with geographical variation in 
sites can improve the generalizability of a study. 


Key Concepts: Advantages to Multicenter Trials 

e Increases the number of patients available to recruit 
for a trial 

e Decreases the amount of time required to complete an 
accrual target 

e Allows orthopaedic surgeons with similar research in- 
terests to meet, exchange ideas, and pursue further 
collaborations with the aim of improving surgical 
care to trauma patients 

e Large multicenter trials with geographical variation in 
sites can improve the generalizability of a study 


A potential disadvantage of conducting a multicenter ran- 
domized controlled trial and using a central methods cen- 
ter is the increased cost. When applying for funding, the 
nominated principal investigator needs to include the 
cost of staffing and maintaining a central methods center 
in his or her grant application (see Chapter 31). 

The number of staff members and their roles at the cen- 
tral methods center depend on the study design and the 
study timeline. The previous chapter provided a descrip- 
tion of the central methods center staff. The number of re- 
search personnel required for an efficient methods center 
depends upon the number and size of ongoing clinical 
trials. A multicenter initiative with a sample size of over 
1000 patients will often require a fulltime study coordina- 
tor, a database manager, a part-time financial manager, a 
data analyst, two research assistants, and an administra- 
tive assistant. Central methods centers that run multiple 
trials often have operating budgets of over a million dollars 
and more than 30 or 40 paid research personnel.! 


Key Concepts: Central Methods Center Staff 
e Study coordinator 
e Database manager 
e Financial manager 
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e Finding qualified research personnel to hire (study co- 
ordinator, data manager, data analyst, research assis- 
tant, etc.) 


e Data analyst 
e Research assistant(s) 
e Administrative assistant(s) 


Promoting Centralized Randomization 


To ensure concealed treatment allocation, an automated, 
computerized, centralized randomization system is opti- 
mal. 


Jargon Simplified: Concealed Treatment 

Concealed treatment is “the inability of participating in- 
vestigators to determine the treatment allocation of the 
next enrolled patient into a trial.”! 


Many multicenter trials randomize patients by envelopes 
distributed to each center; however, envelopes are not 
tamper proof and do not ensure concealed randomization. 
A 24-hour remote randomization system is the best 
method to conceal randomization in multicenter trials. 
This can be achieved by 24-hour research pager system 
at the central methods center operated by the study coor- 
dinator and the research assistants or by currently avail- 
able automated, computerized telephone randomization 
systems.' A disadvantage to the 24-hour research pager 
system is that the central methods center staff must be 
available and able to quickly answer a pager at all hours 
of the day and week, which can also be quite costly. There 
may also be errors, if the central methods center staff pro- 
vides an incorrect treatment allocation. Most systems are 
able to support the randomization of multiple trials, which 
can help to reduce the initial cost of purchasing such equip- 
ment. Therefore, we recommend using an automated ran- 
domization computer for large surgical trials. 


Key Concepts: Automated Randomization System 

An automated randomization computer is advantageous 
because it guarantees concealment and reduces staff 
demands. 


The clinical research coordinator or delegate at each parti- 
cipating clinical site screens a potential research partici- 
pant to ensure that a patient meets all eligibility criteria, 
obtains informed consent, and then proceeds with the ran- 
domization process. It is best to contact the randomization 
system as close as possible to the time the treatment is to 
be administered (i.e., surgery time, if it is a surgical trial), to 
ensure that the study participant does not change his or 
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her mind about participating in the trial or become ineligi- 
ble for the trial. The clinical research coordinator or dele- 
gate then calls a 24-hour telephone computer randomiza- 
tion service, often with a toll-free number at the central 
methods center. Alternatively, Internet randomization 
may also be used in a similar manner. The clinical research 
coordinator or delegate will enter their unique hospital site 
code, patient’s date of birth, and any study specific vari- 
ables that are being used for stratification. The system 
may also ask to review some of the key eligibility questions. 
Following this procedure, the automated randomization 
computer reviews the information entered by the caller 
and then provides the caller with a treatment allocation 
(i.e., treatment A or treatment B). 

Research staff at the Central Method Center is responsi- 
ble for developing the randomization script and program- 
ming the automated randomization system (Fig. 36.1). Ty- 
pically, the study coordinator and the nominated principal 
investigator write the script for randomization, the data 
analyst provides a list of random numbers and the block 
sizes, and finally, the data manager programs the script 
and the treatment allocations into the randomization 
computer. The data manager and the study coordinator 
are responsible for the maintenance of the randomization 
system throughout the trial. In addition, the study coordi- 
nator is in charge of ensuring that all of the clinical centers 
are properly trained on how to use the randomization sys- 
tem correctly. 


Examples from the Literature: Example Script of a 

Telephone Randomization System for a 

Trial Comparing Plating versus Nailing 

You have reached the automated randomization system. 

Please enter your unique study and center identification 
number. {Center enters unique number.} 

To randomize a patient into the trial, please enter 1. To 
review the last patient randomized at your center, 
please press 2. To review a patient randomized 
by the study identification number, please press 3. 
{Center enters 1.} 

Does the patient meet the eligibility criteria, please enter 
1 for yes or 2 for no? {Center enters 1.} 

Has the patient provided signed informed consent for 
participation in the trial, please enter 1 for yes or 2 
for no? {Center enters 1.} 

Does the attending surgeon agree that this patient 
should be randomized into the trial, please enter 1 
for yes or 2 for no? {Center enters 1.} 

This patient has been assigned to surgical treatment 2. I 
repeat, this patient has been randomized to surgical 
treatment 2. 

Thank you for randomizing a patient into the study. 
Good-bye! 


Centralized Data Management 


The success and integrity of any clinical trial depends upon 
the data quality and data management.'! Contemporary 
central methods centers utilize centralized computer 
data collection systems that have either fax-based or Inter- 
net-based data validation. If a fax-based system is used, 
such as the DataFax Management System,” completed 
case report forms are faxed from participating centers di- 
rectly into a DataFax Management System (DataFax Sys- 
tems, Inc., Hamilton, Ontario, Canada) at the central meth- 
ods center. Upon receipt of the data, a research assistant at 
the central methods center makes a visual check of each 
scanned form. Any missing data, implausible data, and in- 
consistent data are captured by the built-in logic check of 
the DataFax system and saved in the system until they 
are resolved. Most Internet data management systems 
have similar setups, although the data can be entered di- 
rectly into the Internet system at the clinical center, either 
by the clinical research coordinator or the research partici- 
pant. Chapter 39 describes the different data management 
systems. 

The DataFax system generates weekly reports summar- 
izing the errors in the data and any overdue and outstand- 
ing data. These summary reports are referred to as quality 
control reports and are sent to the clinical sites automati- 
cally from DataFax via either fax or e-mail after the study 
coordinator has reviewed them. The quality control re- 
ports summarize (1) the number of patients entered into 
the trial; (2) when the patient was enrolled into the trial, 
the date of the last follow-up assessment each patient 
completed, and when the next follow-up appointment is 
due; and (3) outstanding data clarification requests and 
overdue follow up assessments. If a clinical center does 
not respond to their quality control report within a couple 
of days of receiving it, the research assistant at the methods 
center will contact the clinical research coordinator by tel- 
ephone to discuss the report with them and work together 
on obtaining any overdue data. If a site remains delinquent 
with their data for a longer period, the nominated principal 
investigator will contact the site principal investigator to 
discuss the concerns regarding the overdue data. This en- 
sures that the data are complete and received in a timely 
manner. 


Key Concepts: The Quality Control Report 

A quality control report summarizes: 

e The number of patients entered into the trial 

e The date a research participant was enrolled into the 
trial 

e The date of the last follow-up assessment each patient 
completed 

e The date when the next follow-up appointment is due 

e A list of overdue follow-up assessments 

e A list of outstanding data clarification requests 
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Fig. 36.1 (a) Screenshot of a 
sample login page of an In- 
ternet-based randomization 
system. (b) Screenshot of a 
sample randomization page 
of an Internet-based rando- 
mization system. (c) Screen- 
shot of a sample Internet- 
based treatment allocation 
page of a randomization 
system. 
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The data manager is responsible for programming and 
maintaining the logic checks on the database. The research 
assistants are usually responsible for the data entry or va- 
lidation, and consult the study coordinator when ques- 
tions or problems arise. The data analyst is responsible 
for writing the necessary statistical coding to facilitate 
any data cleaning. 


Preparation and Distribution of Study Material 


The Central Methods Center is also responsible for prepar- 
ing and distributing all relevant study material including 
the study protocol, the manual of operation, multiple co- 
pies of the case report forms, and financial material includ- 
ing budgets, financial statements, and payments. The pro- 
tocol (described in Chapter 25) is usually written by the 
nominated principal investigator, with input from all 
members of the steering committee and the central meth- 
ods center study team. The manual of operations (see 
Chapter 41) is developed by the nominated principal in- 
vestigator and the study coordinator and provides details 
on the every day conduct of the trial. The study coordinator 
and the data manager design the study-specific case report 
forms using software specific for the data management 
system. The nominated principal investigator also pro- 
vides substantial input and clinical expertise into the de- 
velopment of the case report forms to ensure that all rele- 
vant clinical data are collected efficiently (see Chapter 40). 
The research assistants at the central methods center are 
responsible for ensuring that the case report forms are 
photocopied, correlated into binders, and shipped to the 
clinical centers. In large clinical trials, case report forms 
are sent in multiple shipments throughout the trial, as 
the clinical sites require additional case report forms. It is 
imperative that budget information, financial statements, 
and the payments for clinical centers are sent to the clinical 
sites in a timely manner. This all needs to be well organized 
and easy for the personnel at the clinical centers to under- 
stand. 


Key Concepts: Study Material 

The following study material must be sent to the clinical 

centers: 

e Protocol 

e Manual of operations 

e Case report forms that have been correlated into bin- 
ders 

e Financial material including budgets, financial state- 
ments, and payments 


Budgeting and Accounting 
Another responsibility of the central methods center is to 


manage the study finances. The financial manager is pri- 
marily responsible for the trial finances, and works closely 


with the nominated principal investigator and the study 
coordinator. Tasks include preparing detailed study bud- 
gets before a clinical trial (see Chapter 30), complete ac- 
counting of study expenses, completing purchase orders 
for new equipment, material, and supplies, and completing 
payroll to all research personal at the central methods cen- 
ter. Often in multicenter clinical trials, the central methods 
center provides funding to the clinical centers to cover or at 
least offset their costs associated with participating in the 
trial. The financial manager is responsible for setting up le- 
gal contracts through the academic center’s grants and 
contracts office. To transfer funds between institutions, a 
contract detailing what the funding is for, how much fund- 
ing will be issued, when the funding will be issued, and 
who is responsible for the facilitation of the funding is ne- 
cessary when transferring funding between institutions. 


Key Concepts: Contracts with Clinical Sites 

A legal contract is required to transfer funds between in- 
stitutions, such as when funding is transferred from the 
central methods center to a participating clinical site. 
The contract should detail how the funding will be 
used, how much funding will be issued, when the fund- 
ing will be issued, and who is responsible for the project 
at the central methods center and the clinical site. 


It is important to ensure that all funding is properly admi- 
nistered and that the funding is spent according to the 
funding agencies guidelines. For example, some funding 
agencies do not allow alcoholic beverages to be purchased 
and others limit the amount of funding to be spent on tra- 
vel. In addition, funding agency’s often provide a timeline 
for how quickly the funding must be spent to ensure that 
the project is completed within a timely fashion. The finan- 
cial manager communicates regularly with the funding 
agency to be aware of any changes in how the funding is 
to be administered. In addition, the financial manager pre- 
pares the financial reports detailing how the funding was 
spent for each funding agency. 


Fostering Communication with Clinical Centers 


Another vital role of the central methods center is to com- 
municate regularly with the site principal investigators 
and the clinical research coordinators at the clinical sites. 
Effective and efficient communication is a key factor to co- 
ordinating a successful multicenter clinical trial and helps 
to ensure that rigorous health research methodology is fol- 
lowed. The nominated principal investigator, the study co- 
ordinator, and the research assistants are the primary lines 
of communication for a multicenter trial. 

At the onset of a clinical trial, the study coordinator 
meets with the clinical research coordinator at an initia- 
tion visit to review the study protocol, the manual of op- 
erations, and the case report forms. It is important to es- 
tablish a good relationship with each clinical center to en- 
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sure that the center is comfortable contacting the central 
methods center with any questions or concerns. After the 
trial begins, there should be weekly communication with 
each clinical center. E-mail and telephone are the key 
methods of communication that can be used to discuss as- 
pects of the protocol and data with the clinical research co- 
ordinators. 


Key Concepts: Communication 

Weekly communication with each clinical site via tele- 
phone or e-mail helps to resolve any problems that 
may arise. Items to discuss during the weekly telephone 
calls or e-mails include study recruitment, submission of 
data, protocol deviations, and any problems that the 
clinical site is experiencing. The weekly communication 
also provides the clinical site with an opportunity to ask 
any questions that they may have regarding the trial. 


The study coordinator or research assistants should con- 
tact each clinical research coordinator weekly to discuss 
study recruitment, submission of data, protocol deviations, 
and any problems that the clinical site is experiencing. If a 
major protocol deviation occurs, a call should be organized 
between the nominated principal investigator and the site 
principal investigator as soon as possible to discuss the 
case where the protocol deviation occurred, why it oc- 
curred, and to strategize on how to prevent future protocol 
violations. 

Depending on the trial, staff at the central methods cen- 
ter may be required to be available 24 hours a day, 7 days a 
week to answer any questions that the site principal inves- 
tigators or the clinical research coordinators have regard- 
ing the trial. Some central methods centers have 24- 
hour, toll-free hotlines that are available for questions 
from clinical sites. Examples of questions can include clar- 
ification regarding the study’s inclusion and exclusion cri- 
teria, specifically if a certain patient meets the eligibility 
criteria, a question about the randomization system, a 
question about the treatment groups, or a question about 
when to schedule a research participant’s follow-up ap- 
pointment. In surgical trials where two surgical treat- 
ments are being compared, questions may be extremely 
time sensitive as they may arise during the actual surgical 
procedure. In this case, it is necessary to have 24-hour, 7- 
day-a-week resources available. If a multicenter trial is oc- 
curring in multiple different time zones, it is also helpful to 
have research staff at the central methods center available 
to answer questions outside of the usual business hours of 
9 a.m. to 5 p.m. The study coordinator will often be pro- 
vided with a pager and cell phone so that they can be easily 
contacted outside of working hours or when they are away 
from the office. In most studies, it is most convenient for 
the clinical centers to have someone at the central meth- 
ods center to contact after business hours if a question 
arises. 


36 The Role of a Central Methods Center 


Key Concepts: Around the Clock 

Having someone available to answer questions from the 
clinical center 24 hours a day, 7 days per week helps to 
facilitate communication in a clinical trial. 


Newsletters and memos are also efficient methods of com- 
munication. Monthly newsletters can summarize the 
study recruitment, study participant follow-up, outline 
key aspects of the study protocol, any protocol revisions 
or changes to the case report forms, provide answers to fre- 
quently asked questions, and provide details on any up- 
coming meetings or conference calls (Table 36.1). The 
newsletter can be distributed by e-mail, fax, or mail, de- 
pending on the preference of the clinical site. The newslet- 
ter can also be posted on the trial’s Web site. Newsletters 
also serve as a good reminder of the trial. Memos or formal 
letters can be sent to the clinical sites to detail protocol 
amendments, changes to the case report forms, upcoming 
meetings, approaching deadlines, problems with data 
quality, and protocol deviations. These can be sent via e- 
mail, fax, or mail. It is often a good idea for the nominated 
principal investigator or the study coordinator to follow- 
up with a telephone call. 


Finally, an interactive Internet site can facilitate commu- 
nication between the central methods center and the clin- 
ical sites. Most study Web sites have a section summarizing 
the trial, a section summarizing the membership, and then 
a password protection section that contains the protocol, 
manual of operations, the case report forms, sample con- 
sent forms, newsletters, and any study updates. Some 
Web sites also include a section containing frequently 
asked questions regarding the study protocol. It is impor- 
tant to maintain the Web site and keep it up to date so 
that nobody refers to an outdated protocol or case report 
form (Fig. 36.2). 


Organization of Trials Committees and Meetings 


As described in previous chapters, several committees are 
set-up to organize a multicenter clinical trial. Examples of 
trial committees include the steering committee, the cen- 
tral outcomes adjudication committee, and the data safety 


Table 36.1 Items to Include in a Newsletter 


e Summary of study recruitment by clinical site 

e Update study participant follow-up 

e Summarize rates to loss to follow-up 

e Outline key aspects of the study protocol 

e Update and discuss any protocol deviations 

e Describe revisions or changes to the case report forms 
e Provide answers to frequently asked questions 


e Summarize details on any upcoming meetings or conference 
calls 
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and monitoring board, which are all discussed in previous 
chapters. The central methods center staff are responsible 
for selecting appropriate members to participate in the 
committees, scheduling and organizing meetings and con- 
ference calls for the committees, preparing material for the 
committees (reports, adjudication packages, summaries of 
safety data), writing and distributing minutes following 
committee meetings, and ensuring that all action items 
from the meetings are followed in a timely fashion. Orga- 
nizing the committees and the meetings can take a sub- 
stantial amount of time and resources. For instance, in a 
large multicenter clinical trial, a fulltime research assistant 
is often required to work on preparing adjudication 
packages, adjudication case report forms, and scheduling 
adjudication meetings. 

Many large clinical trials have annual investigators’ 
meetings to update site principal investigators on the trials 
progress, to discuss critical aspects of the trial protocol, 
and to address any problems or protocol deviations that 
have occurred. Investigator meetings are associated with 
a substantial cost and time commitment to organize. To 
help reduce costs, investigator meetings can be held along- 
side a national or international orthopaedic meeting, such 
as the American Academy of Orthopaedic Surgeons or the 
Orthopaedic Trauma Association. Items to consider when 
preparing for an Investigators meeting are listed below. 


Jargon Simplified: Investigators’ Meeting 

An investigators’ meeting is a meeting where all key per- 
sonnel involved in the clinical trial meet to discuss im- 
portant aspects of the trial’s methodology, protocol, 
and case report forms. An update on the trial’s progress, 
recruitment, protocol deviations, and any other pro- 
blems encountered can also be discussed. 


Key Concepts: Items to Prepare when Planning an 
Investigators’ Meeting 

e Venue 

° Budget and funding source 

e Invitations 

e Attendance list 

e Agenda 

e Presentation 

e Minutes documenting key discussion and decisions 


Preparation of Reports and Publications 


Throughout the trial and at the end of the trial, several re- 
ports are written. The data safety and monitoring board re- 
quires reporting of all safety data at multiple time points 
throughout the trial. The data analyst and the study coor- 
dinator typically prepare these reports, with consultation 
with the nominated principal investigator and the senior 
biostatistician. In addition, the steering committee may re- 
quest reports summarizing recruitment, follow-up rates, 
protocol deviations, and adverse events to determine the 
overall progress of the trial. 

As atrial comes toa close, a substantial amount of time is 
required to ensure that the quality of the data is high and 
that the data are ready to be published. The data analyst 
performs the majority of the statistical analysis after con- 
sultation with the senior biostatistician and the nominated 
principal investigator. The steering committee also pro- 
vides significant input into any publications or presenta- 
tions of meetings. 
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Conclusions 


The central methods center is vital to the success of any 
clinical trial or research initiative. The role and size of the 
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central methods center will depend on the study design, 
the complexity of the trial, and the size of the trial. 
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The Role of a Data Monitoring Committee 


“Next to doing the right thing, the most important thing is to let people know you are 


doing the right thing.” 


Summary 


The role, membership, and organization of a data monitor- 
ing committee are described in this chapter. A checklist to 


Introduction 


The successful execution of a clinical trial is dependent 
upon the participation of various groups and committees. 
As the safety of participants is of paramount importance in 
all research projects involving human subjects, parameters 
must be in place to constantly evaluate accumulating data 
to ensure participant safety and the scientific merit of on- 
going research.! The Canadian Institutes of Health Re- 
search (CIHR), National Institutes of Health (NIH), and 
many other funding and regulatory bodies require that 
participant safety, trial integrity, and data validity are en- 
sured by the establishment of appropriate monitoring sys- 
tems to assess the risks and benefits of trial participation.’ 
To address this need in large, randomized, multicenter 
trials that evaluate interventions that reduce serious ad- 
verse events and prolong life, a data monitoring committee 
(DMC) is often established.? 


— John D. Rockefeller 


aid in the determination of the need for a data monitoring 
committee is also provided. 


Key Concepts: Data Monitoring Committee 

A data monitoring committee consists of a group of indi- 
viduals with relevant expertise and experience that reg- 
ularly monitor interim data during ongoing trials. In an 
advisory role, they are responsible for assessing the pro- 
gress of a trial in terms of participant safety, study effi- 
ciency, and validity. Based on their observations, recom- 
mendations are made to the study sponsors, funding 
agencies, and principal investigators to continue conduct 
of the trial, with or without protocol modifications, or to 
conclude the research. A data monitoring committee is 
synonymous with a data safety monitoring board.' 


Jargon Simplified: Interim Data Monitoring 

Interim data monitoring refers to the periodic review of 
accumulating trial data, distinct from the continual 
monitoring by the coordination center of trial conduct 
during data processing and source verification.’ 


History of the Data Monitoring Committee in North America 


Since the early 1960s, DMCs have been components of clin- 
ical research, primarily in large, randomized multicenter 
trials sponsored by national agencies like the Canadian In- 
stitutes of Health Research and the U.S. National Institutes 
of Health (NIH). An external advisory panel first recom- 
mended an official committee to perform interim data ex- 
amination to monitor trial conduct, safety, and effective- 
ness to the NIH in 1967.! In accordance with this recom- 
mendation, in 1979, the NIH Research Clinical Trials Com- 
mittee published a guide formally suggesting that “every 
clinical trial should have a provision for data and safety 
monitoring” and that “a variety of types of monitoring 
may be anticipated depending on the nature, size, and com- 
plexity of the clinical trial.” These recommendations were 
reaffirmed in 1994 when the Committee on Clinical Trial 


Monitoring was established by the NIH Office of Extramural 
Research to review the monitoring activities of clinical 
trials.’ Trials sponsored by medical device and pharmaceu- 
tical industries have only recently started to incorporate 
DMCs. This is a result of the increase in industry-sponsored 
trials concerned with major morbidity outcome measures, 
increased collaboration between government and industry 
sponsors, and increased public awareness of the issues that 
may lead to inaccurate, biased results.' Much of the literary 
guidance now available to clinical researchers pertaining to 
the role, conduct, and responsibilities of DMCs can be found 
in the Health Technology Assessment commissioned DA- 
MOCLES Project (DAta MOnitoring Committees: Lessons, 
Ethics and Statistics) in 2003.‘ Please refer to the Suggested 
Reading section for further information. 
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General Functions of the Data Monitoring Committee 


The DMC is not only responsible to study investigators, 
sponsors, steering committee members, and participants, 
but to all future patients enrolled in the trial, and society 
in general.’ The general advisory responsibilities and func- 
tions of the DMC, previously outlined in Chapter 35 of this 
text, are listed for your convenience below. 


Key Concepts: Responsibilities of the Data Monitoring 

Committee 

e “Review the research protocol, informed consent 
documents and plans for data safety and monitoring 

e Evaluate the progress of the trial, including periodic 
assessments of data quality and timeliness, participant 
recruitment, accrual and retention, participant risk 
versus benefit, performance of the trial site, and other 
factors that can affect study outcome 

e Consider factors external to the study when relevant 
information becomes available, such as scientific or 
therapeutic developments that may have an impact 


on the safety of the participants or the ethics of the 
trial 

e Review study performance, make recommendations, 
and assist in the resolution of problems reported by 
the principal investigator 

e Protect the safety of the study participants 

e Report on the safety and progress of the trial 

e Make recommendations to the funding agency, the 
principal investigator, and, if required, to the Food 
and Drug Administration (FDA) concerning continua- 
tion, termination, or other modifications of the trial 
based on the observed beneficial or adverse effects 
of the treatment under study 

e If appropriate, conducting an interim analysis of effi- 
cacy in accordance with stopping rules which are 
clearly defined in advance of data analysis and have 
the approval of the data safety and monitoring board 

e Ensure the confidentiality of the trial data and the 
results of monitoring”? 


The Necessity of a Data Monitoring Committee 


Although all clinical trials - Phase I, II, and III - must neces- 
sarily monitor data and safety, not all trials require a formal 
committee external to the study investigators and spon- 
sors to perform this duty.' The degree of risk of participa- 
tion in the study dictates the method and degree of mon- 
itoring.” This monitoring may be performed in a variety 
of ways ranging from internal monitoring by the principal 
investigator, study, or sponsor staff — as is the case in small 
phase I trials - to the establishment of a formal, indepen- 
dent DMC for large phase III trials. Clinical “on site” moni- 
toring of protocol adherence, data management, and qual- 
ity control can be performed internally by the blinded 
sponsor or study staff.' Identification of potential adverse 
events and protocol amendments are the monitoring re- 
sponsibility of blinded study investigators and sponsors.! 


Jargon Simplified: Phase I, Il, and III Trials 

Phase I Trial - Phase I trials involve the study of new in- 
terventions with a relatively high risk of adverse 
events for a small group of patients. Due to the high 
risk, regulatory bodies often require continuous 
safety monitoring and reporting. 

Phase II Trial - Phase II trials usually follow phase I trials 
when more information is known about the inter- 
vention or disease being studied. Safety and data 
monitoring is very similar to that in phase I trials. 

Phase III Trial - Phase III trials usually involve many par- 
ticipants subjected to long periods of intervention 
exposure. New interventions are often compared 
with standard interventions or a placebo. Treatment 


allocation is random with data usually blinded. Safety 
and data monitoring is most commonly performed by 
a data monitoring committee. 


In determining whether your clinical trial requires the es- 
tablishment of a formal DMC, consult Table 37.1. If you an- 
swer yes to any of the questions, it is recommended that 
you consider establishing an external, independent body 
to perform interim data monitoring as issues related to 
participant safety, data validity, and trial efficiency are 
more likely to arise. 


Examples from the Literature: Checklist to Determine 
the Need for a Data Monitoring Committee 


Large, multicenter trials of long duration increase the risk 
of safety concerns because the greater, prolonged expo- 
sure may cause adverse events that are initially not recog- 
nizable.! Trials involving vulnerable populations or high- 
profile, pivotal interventions with the potential to pro- 
foundly alter clinical practice should also employ a DMC 
for safety and data monitoring to support the scientific 
merit of the study.’ 

It may not be practical to establish a DMC for trials of 
short duration as the DMC may be unable to adequately 
contribute before study results are disseminated.! Further- 
more, it is often unnecessary to establish a DMC for admin- 
istrative or behavioral trials, studies aimed to demonstrate 
a biological principle, or trials involving interventions with 
minimal associated risks to participants.* 
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Table 37.1 Checklist to Determine the Need for a Data Monitoring Committee 























Is the scope of the trial large? Yes No 
Is the study a randomized trial? Yes No 
Does the trial consist of multiple study centers? Yes No 
Is the trial intervention intended to reduce major adverse health outcomes or prolong life? Yes No 
Are trial participants subject to an increased risk of major adverse health outcomes as a result of their condition, Yes No 
regardless of whether the study itself addresses these major adverse health outcomes or lesser outcomes like 

symptom relief? 

Is the trial duration of great enough length to warrant establishment of a DMC as being practical? Yes No 
Is there a particular safety concern or elevated risk associated with administering the trial treatment? Yes No 
Is the trial treatment novel, with little safety information available? Yes No 
Are study participants considered members of a potentially fragile population (for example, children or the very Yes No 


elderly), which increases safety concerns as medical interventions may cause or increase an already elevated risk 





of morbid, adverse events? 


Source: Adapted from Guidance for Clinical Trial Sponsors. Establishment and Operation of Clinical Trial Data Monitoring Committees. 
U.S. Department of Health and Human Services, Food and Drug Administration, Center for Biologics Evaluation and Research (CBER), 
Center for Drug Evaluation and Research (CDER) Center for Devices and Radiological Health (CDRH). 

Available at: http://www.fda.gov/CBER/gdIns/clintrialdmc.pdf. Accessed August 31, 2006. 


Overview of the Data Monitoring Committee 


Independence 


ADMCis considered to be independent of study leadership 
if the members have no other involvement in trial design 
and conduct aside from their membership in the DMC. 
Members of an independent DMC should not have any as- 
sociations, including financial or intellectual connections, 
with the individuals responsible for trial organization, 
conduct, and sponsorship.' There are many benefits asso- 
ciated with an independent DMC. Free from the influence 
of trial leadership, DMCs may be more objective in their re- 
view of interim results and consequent recommendations. 
This is not only beneficial for participants, in ensuring their 
safety and protection, but also advantageous for sponsors 
and investigators by promoting the scientific integrity 
and validity of the trial.' In reality, an entirely independent 
DMC is very rare. This results because study sponsors and 
investigators are responsible for selecting committee 
members, reimbursing members for their time, travel 
and communication expenses, and outlining the standard 
operating procedures of the DMC.' Literature reflects that 
this lack of independence may in actuality be beneficial, as 
“intimate knowledge of a trial is required to adequately 
analyze, interpret and act upon accumulating data.”4 
Nevertheless, all necessary efforts should be made to limit 
the influence of trial leadership upon the conduct of the 
DMC to preserve impartiality and the validity of their 
data review and safety monitoring. 


Jargon Simplified: Trial/Study Leadership 

Trial or study leadership refers to the trial sponsors, 
principal investigators, and steering committee mem- 
bers who are responsible for the overall management 
of the trial. 


Composition and Selection 


The trial sponsor, principal investigator, or steering com- 
mittee may appoint the members of the DMC. DMCs con- 
sist of three or more individuals, depending on the need 
for representation of multiple disciplines.’ Alternatively, 
the trial leadership may appoint a chairperson for the 
DMC who will, in turn, select the remainder of the commit- 
tee. 

To simplify decision making and facilitate consensus, it 
is advised that the committee be composed of an odd num- 
ber of members.’ It is logistically preferable to have a small 
number of committee members to facilitate frequent 
meetings. Sponsors, investigators, and regulatory authori- 
ties should not be members of the DMC as knowledge of 
unblinded interim data may affect their usual behavior in 
recruiting, treating, and monitoring the progress of pa- 
tients, which, in turn, will bias the study results and nega- 
tively affect the validity of the trial.! 

Important factors in selecting committee members in- 
clude pertinent clinical expertise, DMC, and clinical trial 
experience, and the absence of conflicts of interest. 
Although preferential, it is not necessary that all members 
have prior experience serving on DMCs.' Committee mem- 
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bers may undergo informal training involving review of 
published case studies and attendance of other DMCs as 
an observer.* Committee members should have a reputa- 
tion for objectivity, sound judgment, impartiality, and be 
willing to deal with the pressures that may arise from trial 
leadership or the media.* 

The committee is usually composed of members with 
different areas of expertise, which are outlined in Key 
Concepts below. Committee membership is also given in 
Fig. 37.1. 


Key Concepts: Key Members of the Data Monitoring 
Committee 

e A chairperson, with prior experience serving as part of 
a data monitoring committee (DMC), firmly com- 
mitted to the trial for its duration. The chairperson 
will have an administrative scientific leadership role, 
facilitate discussion and consensus, and act as the liai- 
son between the DMC and trial leadership.‘ 
Clinicians with clinical or surgical expertise in the field 
of study 

A biostatistician, knowledgeable about the statistical 
methods and analyses of clinical trials 

Nonscientists representing the perspective of the pub- 
lic. These individuals will not participate in the trial as 
subjects, but may have the condition being studied or 
be a representative of their nation in the case of inter- 
national studies where cultural issues may arise. 
Other scientists and clinicians with specific knowledge 
that is applicable to the trial including pharmacolo- 
gists, toxicologists, and epidemiologists 


data monitoring committee. 


e Individuals with medical ethics knowledge and ex- 
perience. This may include lawyers. 


Adapted from the Guidance for Clinical Trial Sponsors. 
Establishment and Operation of Clinical Trial Data Mon- 
itoring Committees. U.S. Department of Health and Hu- 
man Services, Food and Drug Administration, Center for 
Biologics Evaluation and Research (CBER), Center for 
Drug Evaluation and Research (CDER) Center for Devices 
and Radiological Health (CDRH). Available at: http:// 
www. fda.gov/CBER/gdlns/clintrialdme.pdf. Accessed 
August 31, 2006.1 


The committee members should appropriately represent 
gender, racial and ethnic groups and all members must 
be dedicated to maintaining the confidentiality of the un- 
blinded interim data.’ Sponsors, regulatory authorities 
and investigators should not be members of the DMC. Ac 
cording to current literature, it is essential that conflicts 
of interest be avoided or minimized as “patient interests 
are best represented by someone who will realize no scien- 
tific, academic, or financial gain from the trial”.4 Integrity 
of the trial and credibility of recommendations made by 
the DMC are endorsed by the absence of conflicts of inter- 
est. Please see Table 37.2 to determine whether any con- 
flicts of interest exist. 


Examples from the Literature: Conflicts of Interest 
Checklist 

If you were able to answer yes to any of the aforemen- 
tioned questions or in the case that other conflicts of in- 


Table 37.2 Conflicts of Interest Checklist for Data Monitoring Committee Members 


Do any members of the DMC have 


Financial interests that may be affected by the trial outcome and results? These interests may include ownership Yes No 
in the pharmaceutical or device companies involved, investment in competing interventions or companies, 


or consulting arrangements with the trial sponsorship. 





“Intellectual” conflicts of interest? These arise when individuals possess strong opinions on the merits of the Yes No 


scientific interventions involved in the trial. 














Hands-on involvement in the planning or conduct of the trial? Yes No 
Involvement in the authorship or publication of the trial? Yes No 
Involvement in regulating the conduct of the trial? Yes No 
An emotional involvement with the trial or relationships with the trial leadership? Yes No 


Source: Data from Sydes MR, Spiegelhalter DJ, Altman DG, Babiker AB, Parmar MK; DAMOCLES Group. Systematic qualitative review of 





the literature on data monitoring committees for randomized controlled trials. Clin Trials 2004; 1:60-79. 
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terest arise, one of the following actions must be taken 

depending on the scope of the conflict: 

e For minor conflicts of interest, simple disclosure to all 
members of the DMC may be sufficient. 

e For major conflicts of interest, individuals must elim- 
inate the conflict or refrain from serving as a member 
of the DMc4 


Standard Operating Procedures 


Before study initiation, sponsors and principal investiga- 
tors must create a section in the study protocol devoted 
to DMCs and include instructions pertaining to the hand- 
ling of conflicts of interest. Procedures to ensure confiden- 
tiality of interim data must also be included in this section 
of the protocol. The unblinded interim data presented to 
the DMC should not be available to any of the individuals 
conducting the trial, as this may bias their study conduct, 
trial design, and plan for analyses.' The schedule, fre- 
quency, format, attendance, and conduct of DMC meetings, 
as well as the format for interim data presentation should 
also be established in the protocol.’ Collectively, these 
well-defined procedures can be referred to as standard op- 
erating procedures (SOPs). The SOPs may be drafted by 
either the sponsor or DMC and approved prior to study in- 
itiation.! 


Data Preparation and Presentation 


Interim data should be compiled by the trial statistician 
and appropriate analyses should be performed before pre- 
sentation to the DMC.' Although analyses by an indepen- 
dent statistician would preserve the confidentiality of in- 
terim data, only the trial statistician has the necessary 
knowledge of the condition being studied, data manage- 
ment, and other aspects of trial organization for appropri- 
ate interim analyses.’ The statistician should prepare the 
data in two sets. The first should consist of “open” informa- 
tion that is appropriate for all individuals involved in the 
trial to review without the risk of revealing blinded interim 
results. This data focuses on trial conduct issues and may 
include trial eligibility, recruitment, and withdrawal rates. 

The second set of data may consist of “closed” informa- 
tion appropriate for members of the DMC to access only. 
This includes unblinded interim results that would likely 
bias future trial management and analyses by sponsors 
and investigators. The unblinded data should be presented 
using codes (for example, group 1 and group 2) that are 
only known to the study statistician and members of the 
DMC to avoid inadvertent unblinding if the data are ever 
misplaced.' Inadvertent unblinding of the sponsor, poses 
a risk of further unblinding of participants and investiga- 
tors, which may hinder the objectivity of trial conduct 
and thus damage the integrity of the trial.’ It is imperative 
that the interim data presented to the DMC is as up-to-date 


as possible.’ The statistical approach and presentation of 
interim data are proposed by the sponsor, investigators, 
and steering committee prior to study commencement 
and subject to approval by the DMC during an initial meet- 
ing.! It is essential that confidentiality be maintained 
throughout all trial phases including periods of:? 

e Data collection 

e Data monitoring 

e Interim results preparation 

e Emerging results review 

e Monitoring recommendations 


To enable DMC members to sufficiently explore, consider, 
and evaluate the information, interim data and statistical 
analyses should be made available ~2 weeks prior to com- 
mittee deliberations.* 


Data Monitoring Committee Meetings 


Frequency 


The frequency of meetings is largely dependent upon the 
scope and duration of the trial. Depending on the per- 
ceived risk of the study interventions, annual meetings 
may be adequate for some studies. For example, studies 
of rapidly progressing, fatal conditions with short recruit- 
ment periods, may require more frequent monitoring than 
studies of diseases with good prognoses that enroll pa- 
tients over a longer recruitment period.* The frequency 
of meetings, or considerations used to determine the fre- 
quency, should be outlined in the study protocol as part 
of the standard operating procedures of the DMC.! 


Meeting Format 


It is preferential for DMC members to meet in person; 
however, in cases where issues must be urgently ad- 
dressed, it may be necessary for deliberations to be made 
by telephone or via electronic communication.! Further- 
more, when numerous meetings have already been held 
and DMC members are aware of the issues that may arise, 
communication via telephone, e-mail, or traditional mail 
may suffice.! 


Initial Meeting 


The DMC should meet prior to commencement of the 
study to discuss the study protocol, standard operating 
procedures, informed consent forms, data collection pro- 
cedures, analysis plan, trial documentation, and plans for 
monitoring data safety and effectiveness. Investigators, 
sponsor representatives, and regulatory representatives 
may be present during these discussions.* This meeting 
is essential, as it is important for DMC members to become 
more acquainted with one another as they build success- 
ful, professional working relationships.* Any amendments 
and modification proposals can be presented to the study 


www.urdukutabkhanapk.blogspot.com 


sponsors and investigators.' This initial meeting also pro- 
vides the DMC an opportunity to discuss the specific con- 
duct and management of their meetings, including the for- 
mat of data presentation, procedure for making decisions, 
and the handling of meeting documentation - meeting 
minutes and DMC recommendations to study personnel.! 


Structure of Meetings 


Each meeting of the DMC should consist of an initial open 
session for all study personnel, a closed session for the 
study statistician and DMC members only, and a conclud- 
ing open session to allow recommendations of the DMC to 
be presented to study sponsors and investigators.! This for- 
mat is designed “to preserve confidentiality while maxi- 
mizing the opportunities for interaction with all indivi- 
duals who would have valuable input for the committee.”4 


Key Concepts: Data Monitoring Committee Meeting 

Structure 

e The initial open session: 
It is necessary for DMC members and study personnel 
to discuss issues associated with conduct of the trial 
and external data that may influence the trial results. 
During this session, the study statistician may present 
nonconfidential/blinded data that will not affect the 
objectivity of the study investigators and sponsors 
during trial management and execution.! This infor- 
mation may include: 
- Eligibility rates 
- Ineligibility factors 
- Recruitment status 
- Baseline characteristics of the patient population 
- Accuracy and timeliness of data collection at study 

sites! 

It is beneficial to have many members, representing 
various groups involved in the trial, present for the 
open session to provide intimate insights into all as- 
pects of the trial, raise awareness about key issues 
for the DMC to consider, facilitate discussion, and pro- 
vide answers for pressing questions and issues.’ Indi- 
viduals should include representatives of the trial 
sponsors, investigators, steering committee, and 
even regulatory bodies when applicable.! 


e The closed session: 

The closed session will involve monitoring of trial 
safety, effectiveness and conduct. A fundamental re- 
sponsibility of the DMC is to evaluate whether the trial 
objectives are being met by the end points specified in 
the protocol.! The DMC will monitor the interim data 
to determine the effectiveness of the trial. Based on ac- 
cumulating results, if the probability that the trial in- 
tervention will demonstrate effectiveness is substan- 
tially small, the DMC may recommend early termina- 
tion due to futility.! 
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e The concluding recommendation session: 
Acting in an advisory role, the DMC may make several 
types of recommendations to the study sponsors and 
investigators. Findings of the DMC during interim re- 
view of the data usually support continuation of the 
study as designed. If this is not the case, the DMC 
may make several recommendations. 


Meeting Documentation 


The DMC must ensure that accurate minutes are taken of 
all meetings. These minutes should be divided into several 
parts based on the open and closed sessions of the meet- 
ing. Minutes of the closed session and the confidential ma- 
terial contained within, should only be available to mem- 
bers of the DMC and not be circulated to the study leader- 
ship for reasons previously discussed that may compro- 
mise the integrity of the trial.! 


Data Monitoring Committee Recommendations 


The recommendations of the DMC fall into the three cate- 
gories: termination, continuation with modification, or 
other. 


Key Concepts: Types of Data Monitoring Committee 

Recommendations 

e Recommendations to terminate the study: 
Recommendations to terminate the trial occur for 
several reasons, including: 


1. Futility: This occurs when the interim results reflect 
a very low probability that the study interventions 
will prove effective. 

2. Demonstrated effectiveness: This may occur if the in- 
terim data indicates that study interventions effec 
tively address the study objectives, with individuals 
doing better than their control counterparts accord- 
ing to the outcome measures indicated in the objec- 
tives. 

3. Safety concerns: These may occur if emerging results 
show an increased risk of adverse health events in in- 
dividuals subject to the study intervention. These ad- 
verse events are assessed with respect to type, sever- 
ity and extent. 


e Recommendations to continue the study with major or 
minor modifications: 
Modifications to study eligibility criteria may be re- 
commended if interim data reflect an increased risk 
of adverse events in a particular study subpopulation. 
Other modification recommendations may include 
screening procedures to determine which patients 
may have an increased risk of adverse events, protocol 
revisions, trial termination at particular sites, in- 
creased duration of recruitment or follow-up periods, 
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or amendments to study objectives and secondary hy- 
potheses.* If these recommendations are utilized by 
the trial leadership, patients must be made aware of 
the changes and informed of any other new informa- 
tion with a modified consent form.‘ 


e Other recommendations: 

Other recommendations include temporary cessation 
of enrollment or the unblinding of study staff at a par- 
ticular site. Upon assessment of specific adverse 
events, the DMC may need to recommend that a clin- 
ical site is unblinded to the treatment intervention to 
ensure appropriate action is taken to address the ad- 
verse event.! 


The documented recommendations of the DMC should be 
very clearly expressed to the study sponsors and investiga- 
tors through written and oral communication. The mini- 
mum amount of data required to support the rationale 
for the recommendations should be included to enable 
the study leadership to act upon the recommendations 
in an informed and appropriate manner. The open recom- 
mendation session of the DMC meeting facilitates ques- 
tions and discussion pertaining to the recommendations. 


Conclusions 


For further information concerning data monitoring com- 
mittees, please consult the Suggested Reading list. 
Depending on the magnitude of a trial and the severity 
of concerns on participant safety, a data monitoring com- 
mittee is or is not needed for ensuring the safety and effec 
tiveness of a trial. It is responsible for independently re- 
viewing the protocol and the ongoing study on complete- 
ness, validity, and confidentiality of the data collection and 
on patient safety. Furthermore, the committee performs an 
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ical Trial Data Monitoring Committees. U.S. Department of Health and 
Human Services, Food and Drug Administration, Center for Biologics 
Evaluation and Research (CBER), Center for Drug Evaluation and Re- 
search (CDER) Center for Devices and Radiological Health (CDRH). 
Available at: http://www.fda.gov/CBER/gdIns/clintrialdme.pdf. Ac- 
cessed August 31, 2006 

Sydes MR, Spiegelhalter DJ, Altman DG, Babiker AB, Parmar MK. DAMO- 
CLES Group. Systematic qualitative review of the literature on data 
monitoring committees for randomized controlled trials. Clin Trials 
2004; 1:60-79 


Other Responsibilities 


Other DMC responsibilities include reviewing study con- 

duct data alongside the study leadership. Recommenda- 

tions may be made if this conduct data reflects events 

and rates that may compromise the safety of participants 

or trial integrity. This data includes: 

e Overall study and site-specific rates of recruitment, in- 
eligibility, drop-out, and loss to follow-up 

e Protocol noncompliance and violations 

e Rigor, accuracy, and timeliness of data collection 

e Balance between study interventions 


DMCs may also provide independent, unbiased expert 
counsel to study leadership and Institutional Review 
Boards (IRBs) when ethical issues arise.' Furthermore, 
funding and regulatory bodies may have specific data 
and safety monitoring requirements. Please consult your 
local funding or regulatory body to determine what these 
requirements are and how the standard operating proce- 
dures of your DMC should be accordingly modified. As an 
example, for American studies subject to regulation by 
the FDA, any DMC recommendations and requests pertain- 
ing to patient safety, like intervention dosage adjustment 
due to toxicity, must be reported by the study leadership 
to the FDA and responsible IRBs.! 


unblinded interim analysis of the outcome data, of which 
the results remain unknown to the investigator and the 
sponsor. Based on their findings, they make recommenda- 
tions to the investigators and regulatory bodies on conti- 
nuing, modifying, or terminating the trial. It is of utmost 
importance that interim results remain unknown to spon- 
sors, investigators, or regulatory outhorities, since this 
could bias the final study fingings. 
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The Need for Separate Adjudication of Outcomes 


“Everyone is entitled to his own opinion, but not to his own facts.” 


Summary 


The goal of this chapter is to inform surgeons about the 
role of separately adjudicating outcomes in a surgical 
study. The concept of adjudication is described, including 
its purpose and examples of how to adjudicate various out- 


Introduction 


An increasing number of large, randomized controlled 
trials (RCTs) is being conducted among orthopaedic sur- 
gery and orthopaedic trauma populations. These trials 
aim to establish which surgical interventions result in 
the best outcomes for a specified patient group. 


Jargon Simplified: Randomized Controlled Trial 

A randomized controlled trial is an “experiment in 
which individuals are allocated randomly to receive or 
not receive an experimental preventative, therapeutic, 
or diagnostic procedure and then are followed up to 
determine the effect of the intervention.”! 


It is important that such trials are of high quality, and mini- 
mize sources of bias, so that surgeons can use the results 
when making decisions on how to best care for their 
patients. 


Jargon Simplified: Bias 
Bias is “a systematic tendency to produce an outcome 
that differs from the underlying truth.”! 


One source of bias can occur when the effect of an inter- 
vention is measured or classified differently in one group 
versus the other. For example, we could be more vigilant 
looking for outcomes in the experimental group in a trial, 
if we are aware of which patients are receiving the experi- 
mental versus the control treatment. 


Key Concepts: Types of Bias We Aim to Minimize by 
Using Adjudication 

“Ascertainment bias occurs when the results or conclu- 
sions of a trial are systematically distorted by knowledge 
of which intervention each participant is receiving.” 


— Senator Daniel P. Moynihan 


comes. Additionally, important considerations for choos- 
ing whether to incorporate adjudication into surgical trials 
are discussed. 


Detection or surveillance bias is “the tendency to look 
more carefully for an outcome in one of two groups 
being compared.”! 


To guard against bias in determining the effect of an inter- 
vention, we can attempt to blind or mask the treatment in- 
tervention from people involved in the trial, including the 
participants, health care provider, data collectors, outcome 
assessors, data analysts, data safety and monitoring com- 
mittee, and manuscript writers.’ Placebos are an impor- 
tant strategy for achieving blinding during data collection; 
however, when the intervention is a surgery rather than a 
drug, it may not be feasible or ethical to use a placebo.”* 


Jargon Simplified: Blinding 

“The participant of interest is unaware of whether pa- 
tients have been assigned to the experimental or control 
group. Patients, clinicians, those monitoring outcomes, 
judicial assessors of outcomes, data analysts, and those 
writing the paper all can be blinded or masked.”! 


In a surgical trial, the attending surgeon will most often be 
aware of the treatment group to which the patient was as- 
signed. Potential study outcomes are usually self-reported 
by participants or reported by the investigators at the 
study site. However, bias can be reduced by having some- 
one other than the attending surgeon assess the study out- 
comes or by having an independent individual or group 
evaluate, or adjudicate, the outcomes reported by the at- 
tending surgeon. The process of adjudicating outcomes in 
a clinical trial is discussed in this chapter, as well as the 
key items to consider when choosing whether to use this 
process in your trial. 
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Central Adjudication 
Purpose 


Adjudication is a process used in clinical studies whereby a 
committee of independent, blinded experts determines 
whether investigator-reported events meet the definition 
of a study event. Such committees may have different titles 
according to the various studies, such as clinical events 
committee (CEC), events classification committee (ECC), 
events committee, and central adjudication committee 
(CAC). Regardless of the committee’s title, it generally has 
the same purpose of classifying study events. 

The purpose of conducting a central, blinded, indepen- 
dent evaluation of reported outcomes can be used to re- 
duce bias due to knowledge of the treatment group. Addi- 
tionally, central adjudication of outcomes can reduce var- 
iations in classifying events across multiple study sites 
due to different interpretations of the definition of the 
event. 


Key Concepts: Central Adjudication 

Adjudication is a process used in clinical studies 
whereby a committee of independent, blinded experts 
determines whether investigator reported events meet 
the definition of a study event. 


The Decision Whether or Not To Adjudicate 


Not all investigators decide to use a central adjudication 
process in their trial; it usually depends on the choice of 
primary outcome and availability of resources. If the inves- 
tigators decide to centrally adjudicate outcomes, they must 
consider several factors in the organization of this process. 

Cook et alf list several decisions investigators make re- 
garding the adjudication process: “the size and composi- 
tion of the committee, training requirements, whether to 
use adjudication teams, what outcomes will be adjudicated, 
how cases will be allocated to adjudicators, to what extent 
the review committee will be blinded and independent, 
how the consensus will be achieved, and how study results 
vary with and without adjudication of events.” 


Key Concepts: Factors to Consider 
e Choice of primary outcome 
e Availability of resources 


Primary Outcome 


Jargon Simplified: Primary or Target Outcome 

Primary Outcome: “The primary outcome is the main 
event or condition that the trial was designed to eval- 
uate.” 


Target Outcome: “In treatment studies, the condition in- 
vestigators or clinicians are particularly interested in 
identifying and which it is anticipated the interven- 
tion will decrease or increase.”” 


A centralized reading of clinical information may be more 
important when (1) reading procedures are complex and 
require special skills or training, (2) there is a high degree 
of uniformity and standardization is required, (3) large vo- 
lumes of records are to be read, or (4) separation of the 
reading and treatment process is desired. Cook et al® ex- 
plain that “outcomes requiring less judgment, such as phy- 
siological variables, laboratory measures, and all-cause 
mortality, are less likely to undergo such adjudication.” 

In considering adjudication of outcomes in orthopaedic 
trauma surgery trials, it is important to think about the 
types of outcomes that are typically used in these studies 
and the variation among observers in determining that 
these outcomes have occurred. 

Lochner and colleagues? reported that of randomized 
trials involving fracture care published between 1968 
and 1999, 94% reported on multiple outcomes and found 
190 primary outcome measures among 117 trials. Of these, 
50% were clinical or functional scores, 11% were radio- 
graphic, and 11% were complications. 

Csimma and Swiontkowski report that common out- 
comes in orthopaedic trials include fracture healing and 
infection, although the definition of fracture union is not 
consistent between studies.!° Whelan et al'! investigated 
the interobserver and intraobserver variation among sur- 
geons assessing radiographic healing of tibia fractures 
using four criteria: overall fracture healing, quality of cal- 
lus, number of cortices with bridging callus, and number 
of cortices with visible fracture line. They found that the 
agreement between surgeons was best for the number of 
cortices with bridging callus and number of cortices with 
visible fracture line, whereas intraobserver agreement 
was highest for overall impression of healing and number 
of cortices with bridging callus. 


Jargon Simplified: Interobserver and Intraobserver 
Reliability 

“Reliability refers to consistency or reproducibility of 
data.”’ 

Interobserver reliability is “the extent to which the re- 
sults obtained by two or more observers are similar for 
the same population.” 

Intraobserver reliability is the “variation which occurs 
within an observer as a result of multiple exposures to 
the same stimulus.”"? 


Here we see that different ways of classifying common 
orthopaedic trial outcomes have different rates of agree- 
ment among raters, and specifically radiographic out- 
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comes would lend well to central, blinded, standardized 
classification. 


Examples from the Literature: Adjudication of 

Deep Vein Thrombosis 

Source: Lassen MR, Borris LC, Nakov RL. Use of the low- 
molecular-weight heparin reviparin to prevent deep- 
vein thrombosis after leg injury requiring immobiliza- 
tion. N Engl J Med 2002;347(10):726-730. 

From Methods: “All the patients underwent ascending 
venography of the injured leg within one week after re- 
moval of the plaster cast or brace. Venography was per- 
formed earlier if there was a clinical suspicion of throm- 
bosis. All venographic procedures were performed by 
the method described by Rabinov and Paulin, with the 
use of iohexol contrast medium (Omnipaque 240, 
Nyegaard). A central adjudication committee of three ex- 
perienced radiologists who were blinded to the treat- 
ment assignments evaluated all the venograms. Discre- 
pancies were resolved by consensus. The criterion for a 
diagnosis of deep-vein thrombosis was an intraluminal 
filling defect seen in at least two projections; thrombi 
confined to the superficial or communicating veins 
were not counted. In cases of a suspected pulmonary 
embolism, ventilation—perfusion lung scanning or pul- 
monary angiography was performed and the images 
were evaluated by the central adjudication committee 
with use of the classification system of the Prospective 
Investigation of Pulmonary Embolism Diagnosis.” 


Examples from the Literature: Adjudication 

of Fracture Union 

Source: Govender S, Csimma C, Genant HK, et al. Recom- 
binant human bone morphogenetic protein-2 for treat- 
ment of open tibial fractures: a prospective, controlled, 
randomized study of four hundred and fifty patients. 
J Bone Joint Surg Am 2002;84-A(12):2123-2134. 

From Materials and Methods: “As a separate assessment 
of treatment outcomes, an independent evaluation of 
fracture union was conducted by a radiology panel (Os- 
teoporosis and Arthritis Research Group, University of 
California at San Francisco, California) blinded to treat- 
ment allocation and all other patient data. The indepen- 
dent radiology panel based their assessments on a re- 
view of the centrally digitized radiographic images 
from the post-operative visits of each patient. Auto- 
mated algorithms recorded a fracture as united when 
at least two of the three radiologists reported cortical 
bridging and/or disappearance of the fracture lines on 
at least three of the four cortices viewed on the antero- 
posterior and lateral radiographs. 

An additional analysis combined clinical and radio- 
graphic endpoints. The investigators’ determination of 
treatment success or failure was corroborated by inte- 
grating the criteria of both the surgeons and the inde- 
pendent radiology panel. An outcome was considered 
to be successful when the fracture had healed without 
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a secondary intervention (according to the investigator) 
and was recorded as radiographically united during 
patient follow-up (according to the radiology panel).” 


Investment of Resources 


When deciding whether or not to use central adjudication 
of outcomes, investigators must weigh the expected bene- 
fit of adjudication for accurate determination of outcomes 
against the investment of resources involved and practical- 
ity of undergoing this process.'* To centrally adjudicate 
outcomes for a trial, it takes a considerable amount of ad- 
ministrators’ and experts’ time to collect the relevant infor- 
mation, prepare the information for review, review each 
case, and participate in consensus meetings.'* For exam- 
ple, in a critical care trial, it took a total of 74 days of per- 
sonnel costs for four pairs of adjudicators to adjudicate 
282 cases of suspected ventilator-associated pneumonia.’ 
See Table 38.1, Table 38.2, and Table 38.3 for further infor- 
mation on resources, time, and costs involved in the adju- 
dication process. 

Walter et al analyzed the adjudication process used in a 
study of diagnosis of potentially operable lung cancer, in 
which five adjudicators determined tumor status and 
cause of death.'* They found that the results of the study 
would likely not have changed if two or three adjudicators 
reviewed each case, thereby decreasing the effort spent on 


Table 38.1 Resources Used for Adjudication 


Study Outcome œ All information related to the reported study 
outcome 


- Clinic records 

- Hospital records, including operative reports 

- Radiographs (and radiograph reports if 
needed) 

- Laboratory reports 

- Other diagnostic reports 





Adverse Events œ All information documenting adverse events 
- Clinic records 
- Hospital records 
- Radiographs 
- Laboratory reports 





Deaths e All information documenting deaths 
- Death certificate or autopsy report 
- Hospital records or discharge summary 





Eligibility e All information known about patient eligibility 
at the time of enrollment 


- Emergency room report 
- Hospital records 

- Initial radiographs 

- Laboratory reports 

- Other diagnostic reports 


239 


240 


www.urdukutabkhanapk.blogspot.com 


IIIB Conducting a Research Study 


Table 38.2 Time Required for the Adjudication Process 


Step 1: Requesting information (approximately two weeks): Time 
for study sites to collect and submit requested information. This 
will require more time if information is difficult to locate, missing, 
or requires the patient to sign an additional release form. 





Time-saver tips: 


1. At the outset of the study, make all study sites aware of what 
information will be required for the adjudication process, so 
they can get into a routine of sending the additional required 
information automatically when reporting an event. 


2. Have an efficient system to organize all of the information 
when it arrives. 


3. Verify all information as soon as it is received - identify and 
request anything that is missing, or clarify any uncertainties, 
immediately while the study site still has the chart readily on 
hand. 


4. Communication is the key!-keep in touch with each site regu- 
larly to give reminders, provide support, and problem solve. 





Step 2: Preparing adjudication packages (variable): Time for 
central methods center personnel to blind information and pre- 
pare a complete package for each adjudicator to review each 
event. Additional time may be needed to digitize items for review 
onasecure Web site. This step also requires preparing study forms 
for each case for the adjudicators to record their findings. Don’t 
forget to allow time for shipping. 





Step 3: Adjudicators reviewing each event (allow 2-4 weeks per 
batch of 20 cases): Time for adjudicators to review the relevant 
information for each event and record their findings on study 
forms. This will require more time if adjudicators require addi- 
tional information to make their decision on a case. 





Step 4: Identifying disagreements and preparing for a consensus 
meeting (allow 1 week per batch of 20 cases): Time for personnel 
at central methods center to review adjudicators’ responses, and 
identify disagreements between adjudicators (this can be done by 
entering responses into a database and performing an analysis for 
disagreements). Time will also be required for methods center 
personnel to prepare and send materials for the consensus 
meeting to the adjudicators. 





Step 5: Achieving consensus (variable): Time for adjudicators to 
meet and resolve disagreements on the judgment of a study 
event. 

For example, holding a 2-hour meeting or phone conference each 
month to achieve consensus on the latest batch of 20 cases. 
Alternatively, if you use pairs of adjudicators, they could meet 
separately as mutually convenient to discuss disagreements. 


The amount of time required will vary on the number of adjudi- 
cators, the number of judgments required, the number of events, 
the length of discussion required to reach an agreement, and 
whether additional information is needed to achieve consensus. 





Time-saver tip: Hold consensus meetings soon after the adjudi- 
cators have reviewed the cases individually - they will remember 
the cases more easily, which will save time during the meeting. 


Source: Data from Cook D, Walter S, Freitag A, et al. Adjudicating 
ventilator-associated pneumonia in a randomized trial of critically 
Ill patients. | Crit Care 1998;13(4):159-163. 


Table 38.3 Costs to Include in Your Budget for the 
Adjudication Process 


Item Explanation 


Personnel Study site personnel costs to locate, copy, 


and submit relevant information 
Central methods center personnel costs to 


request, collect, organize information and to 
prepare adjudication packages 





Administrative fees For requesting hospital charts 











Radiographs Fees for duplicating radiographs 
Photocopying and Photocopying costs for duplicating relevant 
supplies information 
Cost of CDs for duplicating and sending 
information available electronically 
Shipping Costs for sending information: 


- From study sites to methods center 
- From methods center to adjudicators 





Web administration Web administration costs for posting 


adjudication material on secure Web site 


Source: Data from Cook D, Walter S, Freitag A, et al. Adjudicating 
ventilator-associated pneumonia in a randomized trial of critically 
Ill patients. | Crit Care 1998;13(4):159-163. 


ascertaining outcomes. The time saved by having a smaller 
adjudication committee was estimated to be 44% if three 
adjudicators were used, and 63% if two were used. The 
authors conclude that when agreement for a particular 
outcome is generally high, the adjudication process would 
be optimized with two or three members. 


Adjudication and Possible Bias 


Investigators must also consider the potential for the adju- 
dication process itself to bias the results of the trial. Varia- 
tions in classifying study events can occur during the adju- 
dication process when there are (1) different thresholds for 
diagnosing the outcome between adjudication pairs, (2) 
low levels of agreement within pairs, and (3) dominance 
of one member in the pair during consensus decision-mak- 
ing. “The importance of measuring somewhat subjective 
outcomes in clinical research notwithstanding, potential 
bias inherent in their assessment needs to be weighed 
against potential bias introduced by an adjudication pro- 
cess itself.”° Some authors suggest using both a risk-benefit 
and cost-benefit ratio for deciding whether the adjudica- 
tion process is warranted.° 


Quality Assurance in the Adjudication Process 
Several strategies can be used to improve the process of ad- 


judication and decrease sources of variation introduced by 
the adjudication process itself: you can provide standar- 
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dized training for adjudicators, monitor agreement be- 
tween adjudicators and direction of consensus decisions, 
ensure that adjudicators have similar thresholds for classi- 
fying study events, randomly have more than one team ad- 
judicate a sample of cases, and randomize cases to adjudi- 
cation teams to avoid biasing the treatment effect that may 
occur with differences in interpretation and criteria be- 
tween teams.° 


Key Concepts: Strategies for Improving the 

Adjudication Process 

e Provide standardized training for adjudicators 

e Monitor agreement between adjudicators and direc- 
tion of consensus decisions 

e Ensure that adjudicators have similar thresholds for 
classifying study events 

e Randomly have more than one team adjudicate a 
sample of cases 

e Randomize cases to adjudication teams to avoid bias 
due to differences in interpretation and criteria be- 
tween teams 


Conclusions 


The primary purpose of separate adjudication of outcomes 
is to reduce ascertainment and detection bias. Also, central 
adjudication of outcomes can be used to reduce variations 
in classifying events due to different interpretations of the 
definition of the event. The decision of investigators to 
centrally adjudicate outcomes depends on the primary 
outcome of the study and the availability of resources. Sev- 
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The Use of the Internet in the Adjudication Process 


The use of Internet technology can aid in the adjudication 
process, especially if the adjudicators are located in differ- 
ent cities and countries. Using a secure Web site, the cen- 
tral methods center can present adjudicators with blinded 
information on each event for assessment. This process 
may be especially valuable for viewing radiographs, as 
radiographs are increasingly stored at hospitals in a digital 
format and it preserves the quality of the image for assess- 
ment. Hospital records may also be available electronically 
or can be converted to an electronic file. This reduces the 
amount of paperwork required for the adjudication pro- 
cess, and saves the cost and time of shipping materials to 
adjudicators. 

The limitations of using the Internet in the adjudication 
process include the cost and expertise required to admin- 
ister the Web site, ensuring the security of patient infor- 
mation, and the cost and time required to convert informa- 
tion to an electronic format if only a hard copy is available. 

This process will become increasingly attractive as in- 
vestigators gain experience with conducting adjudication 
over the Internet, and as more hospital records and radio- 
graphic information is available electronically. This process 
has the potential to save on the cost and time involved in 
adjudicating study events, and to improve the quality of 
the adjudication process. 


eral strategies exist to improve the adjudication process 
and decrease the bias that can be produced by the process 
itself. The adjudication process will become increasingly 
attractive with the potential of Internet technology, which 
can save costs and time involved in adjudicating study 
events. 
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Data Management 


“When I took office, only high-energy physicists had ever heard of what is called the World 
Wide Web ... now even my cat has its own web page.” 


Summary 


Various methods for the management and transfer of data 
between a clinical center and the central methods center 
are described in this chapter. 


Introduction 


As randomized controlled trials have become an increas- 
ingly popular method of addressing clinically important 
questions in orthopaedic surgery, it is imperative that na- 
tional and international collaborations are nurtured by 
multicenter initiatives.' One current limitation of multi- 
center trials is the perceived difficulties in the compilation 
and transfer of data from a clinical center to the central 


Management of Data by Internet 


Internet-based clinical studies have been found to be a ra- 
pid, easily accessible, safe, and secure method of perform- 
ing multicenter clinical trials.2-4 Surgeons who participate 
in multicenter clinical trials need a convenient, inexpen- 
sive, secure way to record and manage data. The Internet 
provides a vast network that enables site principal investi- 
gators and their clinical research coordinators to log data 
through a common interfacet; data security, accuracy, 
and efficiency of trial conduct can be achieved, while low- 
ering cost. 

The key elements of the Internet-based data manage- 
ment system are as follows: (1) the ability to enter, update, 
and correct patient information at participating clinical 
sites; (2) the generation of quality control reports; (3) in- 
stant data analysis through preprogrammed queries; and 
(4) data security and confidentiality to ensure that site 
principal investigators and their clinical research coordina- 
tors have access to only their own site’s enrolled patients.' 

Data access can be individualized in the way that certain 
elements are available based on the function of the user. 
The Internet-based data management system is developed 
and maintained by the data manager at the central meth- 
ods center. The data manager is the only individual with 
the rights to delete data if deemed necessary, adding new 
clinical centers to the data management system, adding 


— President Bill Clinton 


methods center. There are several methods of facilitating 
data transfer and data management including Internet- 
based data entry systems, fax-based data management 
systems, and submission of data by mail to the central 
methods center. A description of how all three methods 
of data submission work and the advantages and disadvan- 
tages to using each system are provided in this chapter. 


new users to each clinical center, and assigning the level 
of access rights to each user with username and password. 
The data manager, the data analyst, and the study coordi- 
nator work together to prepare and review reports of 
data that has been entered into the Internet by the clinical 
centers. The data manager and the data analyst also have 
access to the data analysis section of the system. 

At each participating clinical center, the clinical research 
coordinator is able to enter new patient data and update 
any patient information directly into the Web-based sys- 
tem. In addition, the clinical research coordinator should 
have access to update and view quality control reports. 
Data content is strictly organized in the way that users at 
individual centers only have access to their own data. 

The data management Web site can be linked to the ran- 
domization Web site. The data management Web site can 
also provide general information about the trial design 
and protocol including the eligibility criteria, a short over- 
view about enrolled, excluded and missed patients, and 
printable case report forms (in a format so that the case re- 
port forms cannot be modified accidentally) in case the 
data cannot be entered directly into the database such as 
functional outcome questionnaires that patients complete 
on paper case report forms. 
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An Internet-based data management system is primarily 
based on a structured query language database system 
such as the MySQL or Microsoft SQL.’ However, an online 
interface that connects to the database can be written in 
various programming languages and scripting languages 
such as, Visual Basic Script or Java Script to enhance inter- 
activity, data processing and data validation. A Microsoft 
Access database can also be used; however, the limiting 
factor of access databases is a maximal simultaneous limit 
of 255 users.' Furthermore, compared with a big-scale da- 
tabase like MySQL, Microsoft Access processes data slower, 
which may lead to delays in Web pages that retrieve case 
report forms. Additionally, the data back-up capabilities 
of Microsoft Access databases are limited. In this study, a 
Microsoft SQL database and active server pages (ASP) tech- 
nology, Visual Basic Script, and Java Script are utilized.! 

The advantages of an Internet-based data management 
system are its availability, ease of use, preservation of 
data integrity, improved data organization, instant quality 
control, instant data analysis of key outcome parameters, 
and overall cost efficiency.! Errors in data entry are pre- 
vented at the time of data entry by instant data validation. 
For example, nonsensical data inputs, such as a date of sur- 
gery that precedes the date of injury, is flagged by an alert 
box on the screen. If the user misses a data point by mistake 
he or she will be also presented with an alert message. Pre- 
servation of data integrity is an essential prerequisite for 
an accurate data analysis. These alerts can either be mes- 
sages to the user or data entry “freeze” until the data are 
correctly inputted. Internet-based data management sys- 
tems also allow the study coordinator and the data man- 
ager at the central methods center to have a real-time 
overview of the quality of the data. This enables quick 
and efficient improvements in the data acquisition pro- 
cesses when required. 

A potential disadvantage of an Internet-based data man- 
agement system is the initial financial investment, as the 
cost of programming the Internet infrastructure for a large 
multicenter clinical trial can range from $5,000.00 to 
$100,000.00.! However, although initial development 
costs of Internet-based data management systems can be 
high, they can prove to be cost-effective in the process of 
the trial by increasing efficiency of data management and 
therefore decreasing the cost of faxing, mailing, tele- 


Management of Data by Fax 


Fax-based data management systems are a common and 
efficient way of managing data in large, multicenter clini- 
cal trials. Fax-based systems can also be a good system to 
use for smaller trials and single-center initiates. There 
are multiple software programs available to manage clini- 
cal trial by data and this section is going to use the Clinical 
DataFax System (Clinical DataFax System Inc., Hamilton, 


phones, and travel.' Initial costs can be minimized by care- 
ful study planning and once a system is implemented the 
maintenance costs are usually low. 

Another disadvantage of Internet data entry is that not 
all case report forms can be completed directly onto the In- 
ternet case report forms; thus, paper case report forms are 
still required. For instance, case report forms completed di- 
rectly by the research participant on their functional out- 
come or quality of life may need to be completed on paper, 
as a computer with Internet access may not be available in 
the patient’s hospital room or in the fracture clinic, or the 
research participant may not be comfortable using the 
computerized case report forms. In addition, some clinical 
research coordinators find that it is difficult to transcribe 
data from the patient’s electronic or medical chart directly 
onto the Internet case report forms. As a result, paper co- 
pies of the case reports are often used, and then the clinical 
research coordinator enters the data from the case report 
forms onto the Internet data management system later. 
This can result in increased work for the clinical research 
coordinator and a delay in the central methods center re- 
ceiving the data. 

Internet security is another issue that must be addressed 
when setting up an Internet-based data management sys- 
tem. The system should be protected by several firewalls 
and effectively backed-up daily, to prevent loss due to com- 
puter viruses, worms, and hackers. 


Key Concepts: Advantages and Disadvantages to In- 
ternet-based Data Management 

Advantages 

e Availability 

e Ease of use 

e Preservation of data integrity 

e Improved data organization 

e Instant quality control 

e Instant data analysis of key outcome parameters 

e Efficiency 


Disadvantages 

e Cost 

e Not all case report forms can be completed directly 
onto the Internet 

e Potential security difficulties 


Ontario, Canada) as an example of how a fax-based data 
management system can be utilized. 

DataFax is the premier fax-based data management sys- 
tem for clinical trials. Since its introduction in 1990, Data- 
Fax has been adopted by pharmaceutical companies, con- 
tract research organizations, and universities in the United 
States, Canada, South America, Europe, Africa, and Asia.” 
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DataFax has been successfully used by research groups 
managing many concurrent trials, as well as by research 
collaboratives coordinating large international trials with 
thousands of patients and hundreds of clinical sites. 

Key features of the DataFax system include: (1) simple 
technology; (2) intelligent character recognition; (3) real- 
time edit checks; (4) automated quality control reports; 
and (5) study management reports. DataFax uses simple 
technology as the clinical site coordinators complete paper 
case report forms and fax them directly to the DataFax data 
management system using an ordinary fax machine. Often 
a toll-free number is set-up by the central methods center 
to help reduce the costs for the clinical sites. DataFax iden- 
tifies the faxes it receives by using intelligent character re- 
cognition. The DataFax software receives the faxed case re- 
port forms, identifies the study they belong to, scans the 
data fields, populates the appropriate database tables, 
and queues the new data records for the research assistant 
to validate. The case report forms are viewed as a split 
screen, with the database copy on the top of the screen 
and the image of the case report form on the bottom of 
the screen (Fig. 39.1). 

The DataFax data management system has real-time 
edit checks. The split-screen review of case report forms 
is assisted by programmed edit checks, which can access 
any item stored in the database. 

Quality control reports are automatically generated by 
the DataFax data management system. Data queries that 
are flagged during case report form validation and auto- 
matically created queries by DataFax for overdue visits 
and missing pages, are faxed to the clinical research coor- 
dinators at the clinical sites for resolution (Fig. 39.2). 
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The DataFax system also includes standard reports for 
tracking center performance, listing patient data, monitor- 
ing individual work flow, detecting deviations from proto- 
col, and producing audit trail reports from the transaction 
journals. 

DataFax is in the process of developing an Internet data 
management system, which will be set-up in a similar 
manner to their fax system. This will provide the clinical 
centers with the option of either submitting their com- 
pleted case report forms to the central methods center 
by fax or through the Internet. As the Internet becomes 
more widely used, and as technology advances, a fax-based 
system may become obsolete in the future. Currently, a 
fax-based system is a good way to submit data. 

Advantages of a fax-based data management system in- 
clude its ease of use once it has been programmed, preser- 
vation of data integrity, data organization, and good quality 
control. Fax-based data management systems also allow 
the study coordinator and the data manager at the central 
methods center to have a real-time overview of the quality 
of the data. This enables quick and efficient improvements 
in the data acquisition processes when required. 

A potential disadvantage of using a fax-based data man- 
agement system such as DataFax is the initial financial in- 
vestment, as the cost of purchasing, programming, and 
maintaining a fax-based system for a large multicenter 
clinical trial can range from $25,000.00 to over 
$100,000.00. Although the initial development cost of a 
fax-based data management systems is high, this can be 
offset by running multiple clinical trials through the 
same system. Another potential disadvantage of a fax- 
based data management system is the actual faxing. Toll- 
free line charges and long-distance fees can be quite ex- 
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SPRINT Quality Control Report 


DataFAX #015 


014-020925 


QUALITY CONTROL REPORT # 014-020925-01 


Plate 501 Page 1 
Study Coordinator Sign and Date 


PATIENT STATUS SUMMARY (* identifies patients with data queries in this report) 


PATIENT ENTRY VISIT LAST FOLLOW-UP 
141001 Inc/Bl: 20/03/2001 Final : 24/04/2002 
141002 Inc/Bl: 25/03/2001 Final : 13/06/2002 
141003 Inc/Bl: 04/04/2001 Pinal : 13/03/2002 
141004 Ine/Bl: 23/04/2001 Final : 24/04/2002 
141005 Inc/Bl: 02/05/2001 Final : 24/04/2002 
141006 Inc/Bl: 08/05/2001 Final : 24/04/2002 
141007 Inc/Bl: 09/05/2001 Final : 06/06/2002 
141008 Inc/Bl: 22/05/2001 Pinal : 13/06/2002 
141009 Inc/Bl: 21/08/2001 Final : 08/08/2002 
141010* Inc/Bl: 20/09/2001 Final : 11/09/2002 
141011* Inc/Bl: 04/11/2001 9M F/U: 21/08/2002 Final : 
141012 Inc/Bl: 26/01/2002 9M F/U: 11/09/2002 Final 
141013 Inc/Bl: 03/02/2002 6M F/U: 17/07/2002 9M P/U: 
141014 Inc/Bl: 04/02/2002 6M F/U: 11/09/2002 9M F/U: 
141015 Ine/Bl: 01/04/2002 3M F/U: 17/07/2002 6M P/U: 


TOTAL CASES = 15 


FAX/REFAX LIST (Please locate/correct and then fax the following pages of the CRF) 


BE SURE TO INITIAL AND DATE ALL CHANGES. 


PATIENT Visits & Forms PROBLEM 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 
141010 


SMFA 14.11 Final 
SMFA 14.12 Final 
SF-36 15.3 Final 
SP-36 15.4 Final 
SFP-36 15.5 Final 
SF-36 15.6 Final 
SP-36 15.7 Final Missing Page 
SF-36 15.8 Final Missing Page 
SFP-36 15.9 Final Missing Page 
SF-36 15.10 Final Missing Page 
SF-36 15.11 Final Missing Page 
SF-36 15.12 Final Missing Page 
SF-36 15.13 Final Missing Page 


Missing Page 
Missing Page 
Missing Page 
Missing Page 
Missing Page 
Missing Page 


141011 F/U 7.2a (9M) 5. Any re-op or add procs = Yes, new 


you. 





Please complete forms 84.1&2 for this re-operation, Thank 


( Trudy Steele, Saint John Regional Hospital ) 


NEXT FOLLOW-UP 


done 
done 
done 
done 
done 
done 
done 
done 
done 
done 
07/11/2002 


: 26/01/2003 


31/10/2002 
01/11/2002 
28/09/2002 


(Other Problem) 


Fig. 39.2 DataFax screen of a 





pensive. In addition, sometimes a fax may come through 
unreadable and the clinical center will need to resend it, 
which increases costs and can lead to frustration. 


Key Concepts: Advantages and Disadvantages to 
Fax-based Data Management 

Advantages: 

e Ease of use after it has been set-up and programmed 


Management of Data by Mail 


Prior to the technological advances of Internet-based data 
management systems and fax-based data management 
systems, data was submitted from the clinical centers to 
the central methods center by mail. Even with the recent 
advances in technology, a mail-based data management 
system can still be efficient, especially in small studies, 
with limited funding. 

In a mail-based data management system, the clinical 
centers collect the data on paper case report forms and 
then send the forms in batches to the central methods cen- 
ter. The case report forms should be sent by courier, to en- 
sure that they do not get lost in the mail and are easily 
tracked. A copy or the original case report form must be re- 
tained at the clinical center for their records. Often three- 


Quality Control Report. 


Preservation of data integrity 

Instant quality control 

Instant data analysis of key outcome parameters 
Efficiency 


Disadvantages: 
e Cost 
e Problems with faxed image quality 


or four-part carbonless forms are used, to reduce the 
time required for photocopying of the case report forms. 
Once the case report forms have been received by the cen- 
tral methods center, a research assistant enters them 
manually into the database. 

A database can be set up at the central methods center 
easily in a program such as MicroSoft Access that has 
data entry onto forms that look similar to the case report 
form, with some logic checks. The costs are limited to the 
purchasing of the software, which is reasonably priced. Be- 
cause programs such as MicroSoft Access are easily pro- 
grammed, a less-experienced data manager or even the 
study coordinator will be able to do the programming, at 
a lesser cost. In addition, MicroSoft Excel or entering data 
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directly into a statistical package such as SPSS (SPSS Inc., 
Chicago, IL) is also an efficient option if there is not a lot 
of data to enter and monitor. 

The primary advantages to using a mail-based data man- 
agement system is the reduction in cost and the simple 
technology involved in setting up the database and main- 
taining it. In general, the more complex the study, the 
harder it will be to keep track of data using a mail-based 
system. Other things to consider are the storage of the pa- 
per case report forms that are submitted to the central 
methods center; the cost of couriering the case report 
forms; the cost of the increased time the research assistant 
requires to manually enter the data, summarize the 
queries and outstanding data, and contact the clinical sites 
for the resolution of queries. 

A problem with the mail-based data management sys- 
tems is the resolution of queries and maintaining high- 
quality data. Queries must be generated by hand, then 
typed and sent to the clinical site by e-mail or fax. The re- 
solution of queries often takes a long time: there is a delay 
from when the data on the paper case report forms is col- 
lected at the clinical site, mailed to the central methods 
center, and entered into the database. The last step in the 
process is to summarize the queries and outstanding 
data and send the quality control reports to the clinical 


Other Considerations in Data Management 


Regardless of the type of data management system that is 
used in a clinical trial, it is important that all patient data 
are de-identified to be compliant with government patient 
privacy legislation. All research participants must be re- 
ferred to by a study identification number, instead of their 
names, hospital identification number, or insurance num- 
ber. The data has to be at a secure server and issues invol- 
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sites. This process can take weeks or months, depending 
on the complexity of the case report forms and the size 
of the trial. In some cases, this can lead to frustration at 
the clinical site, as they may need to retrieve the medical 
charts, or the original source data may no longer be avail- 
able due to this time lag. 

In some studies, staff from the central methods center 
travels to the clinical site to verify the data and then takes 
the clean data directly to the central methods center for 
entry into the database. This adds an increased cost for tra- 
vel, but usually will reduce the number of queries and in- 
crease the quality of the data. 


Key Concepts: Advantages and Disadvantages to 
Mail-based Data Management 

Advantages: 

e Availability 

e Ease of use 

e Simple technology 

e Cost 

e Delay in receiving, entering, and cleaning data 


Disadvantages: 
e Quality control reports must be generated manually 


ving the physical site of the data storage have to be clearly 
defined in the institutional review board or research ethic 
board applications of all involved sites. In addition, the no- 
minated principal investigator and the site principal inves- 
tigators need to be aware of everyone who has access to the 
information that is stored in the database to be sure that 
the data are being used appropriately.! 


References 


= 


. Zlowodzki M, Bhandari M, Driver R, Obremskey WT, Kregor PJ. Be- 
yond the basics - Internet based data management. Tech Orthop 
2004; 19:88-93 

. Lallas CD, Preminger GM, Pearle MS, et al. Internet based multi-insti- 
tutional clinical research: a convenient and secure option. J Urol 2004; 
171:1880-1885 

. Marks RG, Conlon M, Ruberg SJ. Paradigm shifts in clinical trials 
enabled by information technology. Stat Med 2001; 20:2683-2696 

4. Marshall WW, Haley RW. Use of a secure Internet Web site for colla- 
borative medical research. JAMA 2000; 284:1843-1849 

. DataFax Data Management Systems. Available at: http://www.Data- 
Fax.com. Accessed August 1, 2006 


N 


w 


Sa] 


247 


248 


www.urdukutabkhanapk.blogspot.com 


40 


Study Case Report Forms 


“The whole of science consists of data that, at one time or another, were inexplicable.” 


Summary 


The objective of this chapter is to guide surgeons through 
the development of the case report form (CRF) during 
the preparation of a trial. Documentation is introduced 
by describing the two types of documents used to record 
all trial-related material, including source documents 
and essential documents. The CRF development process 
is then be discussed using a four-stage model: predevelop- 


Introduction 


The success of a clinical trial depends largely on the proper 
documentation of all trial-related activities. Documents 
provide a lasting record of the way a trial was conducted. 
There are two types of documents in a clinical trial: source 
documents and essential documents.’ Source documents 
are the original documents, data, and records relating to 
trial patients. Examples of source documents include pa- 
tients’ medical records, results from procedures such as 
radiographs, laboratory test reports, and patient diaries.' 
Essential documents permit trial conduct to be evaluated. 
Examples of essential documents include protocol and 
amendments, investigator's brochure, research ethics 
board (REB) letter of approval, informed consent forms, de- 
livery receipts, and case report forms (CRFs).' It is crucial to 
document all trial-related activities and file the documen- 
tation appropriately. Remember that clinical trials are de- 
signed to generate data. These data are essential for analyz- 
ing and reporting the trial safely and efficacy results; safe- 
guarding the rights, safety, and well-being of trial patients; 
and creating an audit trail.! 


Jargon Simplified: Source Documents 

Source documents are the original documents, data, and 
records related to trial patients. Examples include med- 
ical records, results from diagnostic tests, and laboratory 
results. 


Jargon Simplified: Essential Documents 

Essential documents are used to allow the trial to be 
evaluated in light of its objectives. Examples include 
protocol and amendments, ethics committee letter of 
approval, informed consent forms, and case report 
forms. 


— Dr. Brendan O’Regan 


ment, content design, style design, and review. The prede- 
velopment stage begins once the study protocol has been 
established and ends at the creation of the data form. 
Two concurrent stages make up the construction of the 
content design and style design. The last stage involves a 
comprehensive review of the forms using a variety of eva- 
luation and feedback methods. 


Key Concepts: Importance of Effective Data Collection 
The proper design of data collection during the clinical 
trial is essential not only for analyzing and reporting 
the trial results, but also for safeguarding the rights, 
safety, and well-being of study patients, as well as for 
creating an audit trail. 


During the preparation of every clinical trial, standardized 
CRFs are derived from the study protocol. The CRF is an es- 
sential document that guides data collection by providing 
a standardized mean of recording pertinent trial-related 
information from source documents.! They are the basic 
tool for ensuring that data collection is consistent and sys- 
tematic for all patients involved in a trial. The development 
of CRFs is an intricate process that must be done carefully 
and effectively to ensure the success of a trial. The CRFs 
should be designed to capture enough data to fully evalu- 
ate the research questions asked in the protocol and collect 
adverse event data for safety reports and processes. They 
should not, however, ask for more information than is ne- 
cessary to reach a meaningful conclusion. Therefore, it is 
crucial that the CRF is well-designed, unambiguous, and 
professional. 


Key Concepts: Four Stages of Case Report Form 
Development 

1. Predevelopment 

2. Designing the content 

3. Designing the style 

4. Review 
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Jargon Simplified: Case Report Form 
The case report form (CRF) is the data form used to re- 
cord trial-related information about each patient in- 


Stage 1: Predevelopment 


The predevelopment phase is a short period that occurs 
after the study protocol has been established and agreed 
upon by all study investigators. The main objective of 
this phase is to becoming familiar with all parts of the 
study protocol. It is valuable to have an understanding of 
the research objectives, research question, rationale 
behind the study, and previous literature (if any exists) 
addressing the issue. Moreover, it is imperative that you 
become familiar with the research design, methods, and 
analysis plan. This section of the protocol provides all of 
the information that you will need to include in the con- 


Stage 2: Designing the Content 


The design of the CRF is crucial to the success of your clin- 
ical trial. For this reason, design standards in content and 
style can be applied to ensure the proper organization 
and visual appeal of the forms. A meticulous and careful 
design strategy will generate better comprehension and 
easier completion of CRFs for clinicians, which results in 
fewer questions and queries in later stages of the trial. 

The information used to develop the content of the CRF 
can be derived from the research design, methods, and 
analysis section of the study protocol. You should already 
be familiar with the protocol as it was the most important 
part of the predevelopment phase in the data form devel- 
opment model. The CRF is composed of a compilation of 
forms that correspond to various stages of the trial. As 
mentioned previously, there is one set of CRFs for each pa- 
tient. The standard CRF module for surgical randomized 
controlled trials (RCTs) has the following components: 
screening, randomization, baseline characteristics, medi- 
cations, preoperative information, surgical report, post- 
operative information, follow-up report, protocol devia- 
tions, adverse events, and early withdrawal. Each study 
will have a unique set of CRFs depending on the nature 
of the question and the outcomes of the trial. Therefore, 
the standard list should be used as a reference model 
only and can be altered to suit the needs of your study. 
Keep this in mind as we discuss the content of each compo- 
nent in the standard model. 

We will now go through each form included in the stan- 
dard model of the CRF. Samples (as shown in the figures) 
are provided of the CRFs developed in a trial comparing 
different fluid irrigation techniques used in open fracture 
wounds (FLOW). A figure illustrates how each form looks 
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volved in a study. It is designed to capture only the data 
required for the analysis of study results and is crucial to 
the success of a clinical trial. 


tent of the CRFs. Knowing the protocol and having a copy 
readily available during the content design phase will be 
a critical resource.” 


Key Concepts: Case Report Form Predevelopment 
Stage 

The most important part of the predevelopment phase 
of the case report form, which occurs after the study 
protocol has been established, is becoming familiar 
with the research design, methods, and analysis plan. 


once it is developed, providing an overview of the content 
to include in each form. 


I Key Concepts: The Standard Case 


Report Form Module for Surgical Randomized 
Controlled Trials 

e Screening form 

e Randomization form 

e Baseline characteristics form 

e Medications log 

e Preoperative information form 
e Surgical report 

e Postoperative information form 
e Follow-up report 

e Protocol deviations form 

e Adverse events form 

e Early withdrawal form 


Screening Form 


The inclusion and exclusion criteria are contained in the 
screening form. This form must be completed for every pa- 
tient in the population being examined for whom the 
study applies. For example, in the FLOW study, every pa- 
tient who presents to the participating hospital with an 
open fracture wound may be eligible to be enrolled in 
the study. It is helpful to specify this broad eligibility criter- 
ion at the top of the form so that there is no confusion as to 
whom to screen. Following these instructions, it is impor- 
tant to label clearly the inclusion criteria and exclusion cri- 
teria. One suggestion is to separate the sections using 
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FLOW SCREENING FORM Form 1.1 
STUDY #082 Plate #001 Visit #000 
Patient Study Patient Date 210 
ID Number Centre # Patient # Initials FL DD. MM 
SCREENING FORM (1 of 1) - FORM 1.1 

Please complete this form for all patients with an open fracture wound. 
For included patients you must answer yes to questions 1-4: 

= Yes No 
1. Patient is 18 years old or over? 
2. Open fracture of the upper and/or lower extremities or pelvis? 
3. Fracture treated within 24 hours? 
4. Patient or proxy has provided informed consent? 
If you answered no to any of items 1-4 
the patient should be excluded. 
For included patients you must answer no to questions 5-14: 

Yes No 
5. Cognitive or language barriers that would limit completion of quality 
of life questionnaires in English? 
6. Anticipated problems with follow-up in the judgement of the resident O 
or attending surgeon? Fig. 40.1 A sample section of a 

7. Known allergy to detergent or Castile soap ingredients? screening form. 




















shading for easy visual analysis. A helpful way of recording 
this data are by columns of prewritten yes/no boxes. All in- 
clusion criteria should be asked in the form of a question, 
giving a “yes” answer if they pass the criterion. All exclu- 
sion criteria should be asked in the form of a question, giv- 
ing a “no” answer if they pass the criterion. Therefore, it is 
visually intuitive whether the patient should be enrolled in 
the study (i.e., all “yes” answers for inclusion criteria and all 
“no” answers for exclusion criteria)? Lastly, this form 
should be concluded with the patient’s status. This may in- 
clude a statement with check boxes indicating whether the 
patient is included, excluded, or missed (eligible, but not 
randomized due to error). There should be instructions 
after each option telling the clinician which form to com- 
plete next. Figure 40.1 shows the beginning of the Screen- 
ing Form used in the FLOW study. 


Key Concepts: Information Contained in the 
Screening Form 

e Inclusion criteria 

e Exclusion criteria 

e Patient status 


Randomization Form 


The randomization form contains any information that you 
may need available at the time of randomization, instruc 
tions on how to randomize, and randomization status. 
For example, information that you may need available at 
the time of randomization may include the patient’s date 
of birth or the patient’s fracture severity. This information 
is very specific toward the trial and varies dramatically be- 
tween trials. The second component of the randomization 
form includes instruction on how to randomize a patient. 
Every trial will employ its own method of randomization. 
The most adequate randomization methods include tele- 
phone or Internet systems.? The type of randomization 
used will be specified in the protocol and must match 
the instructions stated in the CRF. Lastly, it may be helpful 
to include the randomization status of the patient. De- 
pending on the nature of the trial, this information may 
be blinded from those collecting the data and thus this sec- 
tion would not be included. Figure 40.2 shows part of the 
randomization form used in the FLOW study. 





Date of Day Month Year 


randomization: 2|0 


= 









































2. Patient randomized to: 








Group 1: castile soap, low pressure 




















Group 2: castile soap, high pressure 




















3. Initials of person who randomized patient: 








Group 3: saline solution, low pressure 


Group 4: saline solution, high pressure 


Fig. 40.2 A sample section of a 
randomization form. 
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Key Concepts: Information Contained in the 
Randomization Form 

e Information needed at the time of randomization 
e Instructions on how to randomize patients 

e Randomization status (depending on trial) 


Baseline Characteristics Form 


Baseline characteristics vary widely across trials. The data 
collected in this section are used to describe the baseline 
demographic data of the study population (i.e., the pa- 
tients participating in the study). Common questions in- 
clude age, sex, ethnicity, smoking history, height, weight, 
comorbidities, alcohol consumption, and medications. 
The FLOW study, an orthopedic study, included additional 
questions such as the date of the patient’s injury, date of 
their hospitalization, location of their injury, and mechan- 
ism of their injury. Each study will have specific questions 
that relate to the nature of the trial. Remember that all in- 
formation collected in this form will later be used to de- 
scribe the demographics of the study population. Figure 
40.3 shows part of the baseline characteristics form used 
in the FLOW study. 


Key Concepts: Information Contained in the Baseline 

Characteristics Form 

e Demographic data (age, sex, ethnicity, etc.) 

e Trial specific demographic data (e.g., fracture location, 
additional injuries, etc.) 
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Medications Log 


The medications log allows clinicians to record any medi- 
cation the patient is receiving at various points throughout 
the trial. When recording medications, it is important to 
include the name of the drug, dose, unit, route, and fre- 
quency of administration. Also, record the reason for ad- 
ministration (e.g., sedative) and the start/stop dates. 
Some drugs may be ongoing; therefore, it is important to 
have a check box indicating this. Figure 40.4 shows part 
of the medications log used in the FLOW study. 


Key Concepts: Information Contained in the 
Medications Log 

e Medication (name, dose, unit, route, frequency) 
e Reason for administration 

e Start/Stop date (or ongoing) 


Preoperative Information Form 


The data contained in the preoperative form is very speci- 
fic to the nature of the surgical trial. The information that 
must be included will be outlined in the protocol, which 
is your best resource for how to go about determining 
what to include. This form is also specific to surgical rando- 
mized controlled trials where the patient in the study is 
receiving a surgical intervention. For the FLOW study, 
the preoperative information collected included fracture 
characteristics such as the type of fracture, bone loss, 


















































11. Smoking history: 



































No Howlong Packs per day (a) 
Vas If yes, 
specify (years) 









































Yes, quit fyes, , 
specify 








(years) (years) 





Drinks per week 





























Yes If yes, 
specify 
No 





13. Do you currently use recreatinal IV drugs? 
If yes, please specify the type and amount on average you use. 






























































Age began Packs per day (b) Age quit 


Form 3.4 


Visit #001 


FLOW BASELINE CHARACTERISTICS FORM 
oe ee eee | eee 
STUDY #082 Plate #006 
Patient Study Patient 
ID Number = Initials 
Centre # Patient # FL 


BASELINE CHARACTERISTICS FORM (4 of 4) - FORM 3.4 


12. Do you consume alcohol? If yes, please specify the amount on average you drink per week. 


Fig. 40.3 A sample section of a 





# Times 
Yes If yes, perweek If es, 
specify per month specify type 
No 


baseline characteristics form. 
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FLOW MEDICATIONS LOG Form 4.1 
STUDY #082 Plate #010 Visit #001 
Patient study Patient Check Follow-Up Visit: 
ID Number initials F 
Centre # Patient # F L Baseline 3 months 
Discharge 6 months 
MEDICATIONS LOG - FORM 4.1 
2 weeks 9 months 
6 weeks 12 months 
Medication Reason for Start Date 
e.g. Diazepam, Dose: 5, Administration E 
# a ea e.g. sedative/anxiety Stop Date 
f Start date: 
tame cnt [0 
1 DD MM YYYY 
Dose Unit Route Check if |Stop date: 
Stopped 2/0 s . 
Frequency —* Le KT WW Fig. 40.4 A sample section of a 
medications log. 

















2. Type of fracture (check all that apply): 
































































































































5. OTA classification of fractures 









































classification of the fracture, etc. Figure 40.5 shows part of 
the preoperative information form used in the FLOW study. 


Surgical Report 


The surgical report form provides an opportunity to collect 
surgery related study data. This form is also very specific to 
surgical randomized control trials. Similar to the pre-op- 
erative report form, the information that must be con- 
tained in the surgical report form will be specified in the 


3. Involvement of joint: 


Comminuted Segmental Intra-articular 
Transverse Spiral Extra-articular 
Oblique 
4. Bone loss: 
Yes ae ane 0 cm 4cm 
No 1 cm 5 cm 
2 cm 6 cm or more 
3 cm 


(refer to booklet or see www.ota.org/compendium/compendium.html): 





Fig. 40.5 A sample section of a preoperative form. 


protocol. These forms vary widely between studies; how- 
ever, some basic questions include date of surgery, name 
of attending surgeon, person who performed majority of 
the surgery (i.e., surgeon, resident, fellow), procedure spe- 
cifications, implant specifications, irrigation and debride- 
ment, total operative time, surgical delay at hospital, addi- 
tional procedures, complications, and unexpected events. 
To summarize, any data that is specified in the protocol re- 
lating to the surgery should be put into the surgical report 
form. Figure 40.6 shows part of the surgical report form 
used in the FLOW study. 




























































































Inflation pressure: 


(mmHg) 















































>5cm 











tf (minutes) 
15. Was tourniquet used: yes, ime: 
q Yes “specify” Time: 
No 

16. Cortical continuity following fixation: 

0% 25% 50% 75% 100% 
17. Size of postoperative fracture gap: <1cm 1-5cm 

(hours) (minutes) 





18. Total operative time for affected limp: 








19. Time to surgery from injury: 








20. Time to surgery from hospital admission: 





























*if time to surgery is 

>24 hours from time of 
injury, complete a Protocol 
Deviation Form (Fig. 40.9), 
see p. 254 


Fig. 40.6 A sample section of a 
surgical report form. 
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Section A: Antibiotics 

















No —> Complete a Protocol Deviation Form 10.1 





ANTIBIOTIC PROTOCOL 


of 1-6 days. 


bacteria. 





1. Were the appropriate antibiotics given according to the Antibiotic Protocol (see below)? 


Yes —> Record all antibiotics on the Medications Log 


Preoperative: IV antibiotics must be administered commencing on diagnosis. 


Postoperative: IV antibiotics must be administered from 1-4 days post-surgery followed by an oral course of antibiotics for a duration 


Type: The type of antibiotic must cover gram positive bacteria and in cases of significant contamination, gram negative and anaerobic 








Fig. 40.7 A sample section of a postoperative form. 


Postoperative Information Form 


The postoperative information includes any data relating 
to the time after the surgery. This, again, is specific to a sur- 
gical randomized control trial. The information to include 
can be found in the protocol. For example, data concerning 
antibiotics, discharge, and additional procedures may be 
found on this form. Figure 40.7 shows the beginning of 
the postoperative form used in the FLOW study . 


Follow-up Report 


The follow-up report simply includes all data that must be 
collected at follow-up visits. The follow-up schedule will 
be outlined in the study protocol, as well as the tests that 
must be done and data that must be collected at each visit. 


Often, the follow-up schedule will include some clinic vis- 
its as well as some phone calls. For example, in a study with 
three follow-up dates (1 month, 2 month, and 3 month), 
the first and third follow-up (1st month and 3rd month) 
may require X-rays and quality of life measure, where as 
the second follow-up (2nd month) may only include a 
quality of life phone interview. A helpful suggestion is to 
include a cover page for each follow-up that collects data 
such as the date of the follow-up (or if it was missed) and 
whether it was complete. You may also want to also collect 
information on protocol deviations, adverse events, and/or 
additional outcomes. This form is very important and will 
form the bulk of the information used to assess the out- 
comes. The information that must be collected at follow- 
up will be outlined in the protocol and should be adhered 
to. Figure 40.8 shows the beginning of the follow-up form 
used in the FLOW study. 





FLOW FOLLOW-UP REPORT FORM 


LB) Gb eed) ob | Ip [Folowup 


Number: 
STUDY #082 Plate #070 


Patient Study Patient 
ID Number Initials 















































Centre # Patient # F L 


Section A: Patient Specific Information 






























































3. FOR EACH of the following, please indicate if the interview 
was completed AND which version was administered: 















































4. Visit status: 











Please 
specify 











|_| Partially complete 



































Form 8.1 
Discharge |_| 3 months 
2 weeks 6 months 
6 weeks 9 months 














FOLLOW-UP REPORT FORM (1 of 3)-FORM 8.1 


Day onth Year 
1. Date of follow-up: 2|0 
2. Location In person Telephone (only if patient is unable to return to clinic) 


Interview Version administered? 
Complete? Self Interviewer 
Yes No Admin Admin 
SF-12v2 
EuroQol-5D 


Complete: all required data collection forms and questionnaires completed 











Fig. 40.8 A sample section of a 











follow-up form. 
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Surgical Procedures 


1. Wrong pressure used? 





Yes ——+ If yes, please explain: 











No 

















2. Wrong irrigation solution additive used? 





Yes —> If yes, please explain: 











No 























Fig. 40.9 A sample section of a protocol deviation form. 


Protocol Deviation Form 


The protocol deviation form is a way to record all events 
that occur which are contradictory to the study protocol. 
This form is extremely important because these protocol 
deviations have the potential to bias the results of the 
study if they are not recorded properly. It would be great 
if all trials were perfect and everything was always done 
according to protocol, but we accept our humanity and rea- 
lize that this is never the case. Therefore, it is important to 
have a well-designed protocol deviation form to record any 
mistakes that occur in the trial. It is helpful to organize the 
protocol deviation form using subtitles that correspond to 
the other CRFs. Figure 40.9 shows the beginning of the 
protocol deviation form used in the FLOW study. 


Adverse Events Form 


This form provides the clinician with the opportunity to re- 
cord any adverse events that may occur after the patient’s 
enrollment into the trial. It is important to record the date 
of the adverse event on the form as well as a description of 
the event. Specific adverse events that are important to 
your trial can be found in the protocol. You also may 
want to use this form to record the date and explanation 
of any rehospitalizations. It is also important to include 
the outcome of the adverse event, documenting whether 
it was resolved or not. Lastly, this page can include whether 
the attending physician believes the adverse event to be 
related to the study. Figure 40.10 shows the beginning of 
the adverse event form used in the FLOW study. 








Yes —+ 














No 





1. Has the patient had any adverse events since the initial surgery or their last follow-up? 


Please describe the adverse events below. Please complete a separate form for each event. 


Day Month 


Year 





2. Date the adverse event was diagnosed: 



































2/0 





adverse event. 














Deep vein thrombosis 














Sepsis 














Pulmonary embolism 


























Pneumonia 











Fig. 40.10 A sample section of an adverse event form. 


3. Please specify the adverse event (check only one). If there are multiple adverse events, please complete a separate form for each 


Death —> Please complete an Early Withdrawal Form 14.1 


Acute respiratory 
distress syndrome 


Hardware failure, specify location(s): 
Operative adverse event, specify: 


Other: 





Multi-organ failure 














Prolonged hospitalization 
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Day Month 


EARLY WITHDRAWAL FORM (1 of 1) - FORM 14.1 


40 Study Case Report Forms 


Year 























1. Date of withdrawal from study: 





0 

















2. Reason for withdrawal from study: 























Randomized patient without consent 








Randomized a patient we cannot legally follow 








Patient improperly randomized 














Other ——~ please specify: 





Death ———~+ Please complete an Adverse Event Form 12.1 


Unable to locate ——> please note that a patient is considered “unable to locate” only after all resources have been 
exhausted in trying to find the patient 


Patient withdrew consent ——> please provide explanation under comments section below 





Comments: 














Fig. 40.11 A sample section of an early withdrawal form. 


Key Concepts: Information Contained in the Adverse 
Events Form 

e Date of adverse event 

e Specify adverse event 

e Hospitalizations (date and reason) 

e Relation between adverse event and study 

e Resolution of the adverse event 


Early Withdrawal 


Sometimes study participants will choose to withdraw 
from the study, even though they previously agreed to par- 


Stage 3: Designing the Style 


The style and layout of the CRF is just as important as the 
content it contains. Remember, each patient involved in 
the study will have many CRFs; therefore, the proper label- 
ing and organization of the forms is critical. Each CRF 
should be standardized through the header and footer. 
The header should include the study name and number, 
the site number, participant number, participant initials, 
form number, and a fill-in date for the form. The footer 
should include the date of printing and the revision num- 
ber. To use one set of CRFs for each patient at each center, 
there are usually blank boxes for center number, patient 





ticipate in the trial. It would not be in good form to forget 
about these patients altogether and pretend as if they were 
never in the study. Instead, we must keep track of these pa- 
tients (as they must be reported in later analyses) and find 
out why they have chosen to stop participating in the re- 
search study. This form includes two pieces of information: 
the date of withdrawal and the reason for withdrawal. You 
may be able to salvage some of the information pertaining 
to the patient by obtaining any critical information from 
their chart. Figure 40.11 shows part of the early with- 
drawal form used in the FLOW study. 


number, patient initials, and date. Figure 40.1 provides 
an excellent illustration of a CRF header. 

It is important to remember that the CRF may be com- 
pleted by multiple clinicians and clinical research coordi- 
nators at multiple centers; therefore, they must be as clear 
and comprehensive as possible. To assist clinicians, it is im- 
portant to include clear and concise instructions for com- 
pleting the forms. Too much writing on one page can be in- 
timidating and will deter people from reading the instruc- 
tions; therefore, make them brief and to the point. A bul- 
leted action form works well here with bolds and italics 
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Table 40.1 Examples from the Literature: Example 
Instructions for Completing the Case Report Form (CRF) 


Completing the CRF 





e Please print legibly using black ink. 





e Record patient ID and patient initials (First/Last) on all forms. 





e All text and explanatory comments should be brief and within 
the space provided. 





e Use bold and italics to stress important information. 





e Only enter data in the fields provided. 


e |f the answer is zero, do not leave the field blank, write “0.” 





e If the data are missing, print “M” beside the boxes. 





Clinical Site Numbering 





e Each site is assigned a specific number by the study coordi- 
nator. 





e This number will be recorded on the top of each CRF before 
the patient number. 





e Example: Center A is assigned number 1, center B is assigned 
number 2 





Patient Numbering 





e Included patient study ID numbers are assigned randomly by 
the computerized randomization system. 





e Included patient numbers start at 1001, increment sequen- 
tially, and can go as high as 1999 within any one center. 





e Missed patient study ID numbers are assigned by individual 
site coordinators. 





e Missed patient study numbers start at 2001, increment se- 
quentially, and can go as high as 2999 within any one center. 





e Excluded patient study ID numbers are assigned by individual 
site coordinators. 





e Excluded patient study numbers start at 3001, increment 
sequentially, and can go as high as necessary. 





e Example: The first randomized patient at center 1 will be 001 
1001 


to stress some information.! Refer to Table 40.1 for further 
details of what to include in the instructions. 

There are further visuals and aesthetics to consider 
when designing the CRF. Check boxes for recording an- 
swers are more useful than circling answers because boxes 
provide visual cues and circling can be difficult to inter- 
pret.! Also, using boxes simplifies and helps quantify the 
data, making it easier to record in the database and ana- 
lyze. However, there are times when written answers are 
unavoidable and must be used (i.e., protocol deviation 
form). Consider the density of questions on a page. Too 
many questions, although more cost efficient, can result 
in missed questions and more headaches in the end. Too 
few questions can result in clinician or patient annoyance 
and incompletion. It is important to find a middle ground. 
Lastly, font styles and sizes can assist in the reading and 


comprehension of the forms. Stress important words or 
phrases using bold, italics, or underline.’ This will help 
the clinician filling out the forms by drawing their atten- 
tion to important information. Make sure the font size is 
large enough to be read by clinicians (i.e., minimum of 
size 10 point, maximum of size 14 point). 


Key Concepts: Design Standards - Style 

e Include explicit instructions at the beginning of the 
CRF document and throughout the forms. 

Header: Study and site number, participant number 
and initials, form number, date 

Footer: Date of printing, revision number 

Use boxes to hold answers whenever possible. 
Consider the density of questions on each page. 

Bold, italicize, or underline important words, titles, 
and phrases. 

Font size should be 10 to 14 point. 


Stage 4: Review 


Once the CRF is designed it is necessary to review the forms 
using various techniques. Four stages of reviewing can be 
applied to ensure the forms of the best quality. The first 
step is to carefully and meticulously review the forms for 
spelling, grammar, consistency, comprehension, and 
clarity. In this self-evaluation phase, it is important to re- 
view the protocols and double check that all issues have 
been addressed. Read your forms with a critical eye, ensur- 
ing that they flow well and would be understood by an- 
other reader. Following self-evaluation, have a few (three 
or four) peers and/or experts in the field of the study re- 
view the forms. The feedback from these reviewers should 
then considered and possibly incorporated into the forms 
before continuing any further. 

Next, the forms should be carefully reviewed by the data 
analysis team to highlight any significant effects the data 
may have on the analysis. This team may include the 
data manager, the data analyst, and the senior statistician. 
Remember, the purpose of the CRF is to collect all the data 
necessary for answering the research question. Therefore, 
it is a good idea to plan an analysis strategy and ensure 
that the forms address the analysis appropriately. 

The last step in reviewing the forms is to pilot test them 
on a relative number of patients who present to the center 
and pass the eligibility criteria prior to the start of the large 
trial.? This small pilot test will be very indicative of how 
well the CRF works. 


Jargon Simplified: Pilot Testing 

Pilot testing is a technique used after the development of 
the case report form, but before it is used in the clinical 
trial. By testing the CRF on a few patients, this process 
brings to light any of the kinks or mistakes within the 
form before it is applied to every patient in the trial. 
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Keep in mind that pilot testing of the complete CRF may 
not be practical in some circumstances. For example, in 
trials with lengthy follow-up periods, pilot testing all as- 
pects of the form may not be feasible. In this case, pilot 
forms should be used that fit the feasibility and time com- 
mitment of the large trial preparation. 


Suggested Reading 


Bhandari M, Schemitsch EH. Planning a randomized trial: an overview. 
Tech Orthop 2004; 19:66-71 

Bhandari M, Schemitsch EH. Beyond the basics: the organization and 
coordination of multicenter trials. Tech Orthop 2004; 19:83-87 


40 Study Case Report Forms 


Key Concepts: Four Stages of Reviewing 
1. Self-evaluation 

2. Peer evaluation 

3. Data analysis team review 

4. Pilot testing the forms 
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41 
The Study Manual of Operations 


“Good plans shape good decisions. That’s why good planning helps to make elusive dreams 


come true.” 


Summary 


In this chapter, researchers are shown how to create a 
manual of operations for clinical research coordinators. 
The importance of creating a good manual of operations 
for a clinical study is explained, as well as the 12 standard 


Introduction 


Clinical trials are often comprised of many diverse compo- 
nents that all work together to make these studies possible. 
Therefore, it is important that a standard operating proce- 
dure for a clinical trial be created. A standard operating 
procedure describes the way that certain activities are to 
be performed.! Creating a standard operating procedure 
is important because it provides direction, improves com- 
munication, reduces training time, and improves work 
consistency.” This chapter focuses on the standard operat- 
ing procedures manual made for clinical research coordi- 
nators, which is called a manual of operations. It is the clin- 
ical research coordinator’s job to ensure that all the clinical 
and research patients’ aspects are performed smoothly. To 
oversee such aspects, the clinical research coordinator 
must possess extensive knowledge of how these aspects 
of a clinical trial are to be performed. To assist the clinical 
research coordinator with this task, a manual of operations 
should be prepared. A manual of operations is a handbook 
of instructions designed to guide the clinical research coor- 
dinator in how to carry out successfully aspects of a clinical 
trial according to study protocol. It is similar to the study 
protocol in that it provides a comprehensive overview of 
the study. However, the manual of operations differs 
from the study protocol in that it is specifically geared to- 
ward the role of the clinical research coordinator, whereas 
the study protocol is more generalized as it applies to all 
people involved in the trial. 

It is beneficial to a clinical trial that the clinical research 
coordinator be able to answer, in a quick and efficient man- 
ner, any questions that health professionals or patients 
may have about the trial. A good manual of operations 
will describe every part of the trial that the clinical re- 
search coordinator is involved in. It will present the infor- 
mation in a clear and concise manner, expanding on mat- 
ters only when necessary. It is crucial that clinical research 
coordinators have a quick reference in times when a quick 


— Lester R. Bittel 


sections that are typically included in a manual of opera- 
tions. These 12 sections are then outlined individually 
and the important information to be included in each of 
these sections is discussed. 


response is needed, such as when randomizing a patient to 
a type of surgical technique. When a good manual of op- 
erations is not provided, it can lead to confusion, miscom- 
munication, or mistakes being made. Therefore, all matters 
that are of concern to the clinical research coordinator 
should be included in the manual of operations to avoid 
any negative outcomes. In this chapter, those crucial topics 
that must be included in a manual of operations designed 
for clinical randomized controlled trials are outlined. 


Jargon Simplified: Standard Operating Procedure 

A standard operating procedure is a set of written in- 
structions that describes how to perform properly cer- 
tain activities that are to be conducted by an organiza- 
tion. 


Key Concepts: Importance of Creating a Standard 
Operating Procedure 

Standard operating procedures are important in a clini- 
cal trial because they: 

e Provide direction 

e Improve communication 

e Reduce training time 

e Improve work consistency 


Jargon Simplified: Manual of Operations 

A manual of operations is a handbook of instructions de- 
signed to guide the clinical research coordinator in how 
to carry out aspects of a clinical trial according to study 
protocol successfully. 


The box below shows the standard topics that should be 
included in a manual of operations for a clinical trial. 
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Key Concepts: The Standard Topics Included in a 
Manual of Operations 

Background information 

Objectives 

Communication 

Evaluating patient status 

Randomization 


VW N 


Background Information 


To begin a manual of operations, it is a good idea to refresh 
the clinical research coordinators memory of what the trial 
is about. This brief summary of the study should incorpo- 
rate the topic being researched, statistics and past research 
done on this topic, and the potential significance this trial 
could have on future clinical practice. These first few para- 


Objectives 


The next section should state the primary and secondary 
objectives of the clinical trial. This section need only be a 
few sentences as its sole purpose is to remind the clinical 


Communication 


Although the manual of operations should be self-explana- 
tory and encompass every issue a Clinical research coordi- 
nator will ever have to deal with, this is not always the case. 
Some instructions may appear confusing or ambiguous, or 
perhaps some issues have come up that were not included 
at the time of writing the manual of operations. Therefore, 
in times of uncertainty, the clinical research coordinator 
ought to have contact information, so that they know 
whom to contact when they have a question about the 
trial. In multicenter trials, the main contacts will be from 
the trial’s central methods center. The central methods 
center is the original site from which the study is con- 
ducted. All important documents, such as the study proto- 
col and case report forms, are developed there and all data 
collected is sent to the central methods center to be pro- 
cessed and analyzed. The contacts from the central meth- 
ods center should be able to answer any questions or con- 
cerns the clinical research coordinator should have about 


Evaluating Patient Status 


When enrolling patients into a clinical trial, it is imperative 
that the clinical research coordinator know how to identify 
potentially eligible patients properly. It is important that 
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Interventions 

Adverse events 
Patient follow-up 

. Protocol deviations 

10. Trial administration 
11. Center reimbursement 
12. Troubleshooting 


OND 


graphs can be thought of as a condensed version of the 
study protocol. The purpose of this section is to give the 
clinical research coordinator some basic background 
knowledge of the trial, so that they understand the impor- 
tance of the study and can answer general questions about 
why this study is taking place. 


research coordinator what variables the study is looking 
at and what information the case report forms are trying 
to capture. 


the trial. Therefore, their contact information should be 
documented in the manual of operations. This includes 
the names of the main contact people, their daytime con- 
tact information and their after-hours contact information. 
For daytime contact information, the office hours of opera- 
tion, telephone numbers, and e-mail addresses for these 
main contacts should be provided. After-hours contact is 
for urgent matters and therefore an appropriate pager or 
telephone number should be given to reach the central 
methods center contacts in case of emergency. 


Jargon Simplified: Central Methods Center 

The central methods center is the original site from 
which the study is conducted from. All important docu- 
ments, such as the study protocol and case report forms, 
are developed there and all data collected is sent to the 
central methods center to be processed and analyzed. 


all patients referred to each participating center who are 
potential candidates for inclusion in the trial be identified. 
From there, the decision must be made whether the candi- 
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date meets the inclusion and exclusion criteria and there- 
fore, is included in the trial or whether he or she does not 
meet the inclusion and exclusion criteria and therefore, is 
excluded from the trial. 

Because it is not always the clinical research coordinator 
who will be consenting and randomizing patients into the 
trial, it is the job of the clinical research coordinator to 
make all people who will possibly be randomizing patients 
aware of the inclusion and exclusion criteria. Because all 
studies are different, the clinical research coordinator 
must find out all the possible places at their center that a 
potentially eligible patient might present to and all the 
people who could potentially randomize this patient. For 
example, if a clinical trial was done for orthopaedics, 
then possible places a potentially eligible patient might 
present to would be the emergency department, hospital 
ward, or the outpatient department and the people who 
would be likely to randomize this patient would be parti- 
cipating surgeons, residents, fellows, and nurses. All of 
this information should be included in the manual of op- 
erations, so that the clinical research coordinator for each 
participating site will know who to inform of the inclusion 
and exclusion criteria of the study and where to post re- 
minders. 

After hours, most clinical research coordinators will rely 
solely upon residents, fellows, or participating surgeons to 
identify potentially eligible patients. That is why it is im- 
portant to provide reminders for anyone who is likely to 
identify a potential study patient. Reminders can be in 
the form of pocket cards and posters (Fig. 41.1a,b), presen- 
tations at rounds, e-mails, or phone calls to the on-call 
workers. 

It is also the clinical research coordinator’s responsibility 
to make up recruitment packages and place them in a con- 
venient location for people randomizing a patient to easily 
access. A recruitment package consists of case report forms 
and consent forms that must be completed at the time of 
screening and randomization, as well as an instructions 
sheet on how to correctly perform the treatment a patient 
is randomized to. It is also helpful to put in a checklist for 
the person randomizing to make sure that they have not 
missed anything and that they are recruiting patients 
properly. A dictation script for the participating surgeon 
is optional, but can be helpful depending on the study. 

The manual of operations should also list the inclusion 
and exclusion criteria for the study. The clinical research 


Randomization Procedure 


Randomization is an important concept in clinical rando- 
mized controlled trials and it is important that the clinical 
research coordinator understand how this process works. 
It is helpful if the manual of operations explains how the 
patients are being stratified and what the unit of randomi- 
zation is. 


coordinator should know these criteria, so that when as- 
sessing a patient’s eligibility, they have a clear understand- 
ing of the criteria they are looking for. In addition, when 
someone has a question about the study’s inclusion and ex- 
clusion criteria, the clinical research coordinator will prob- 
ably be the one to clarify or explain the criteria. The clinical 
research coordinator will also need to verify that the deci- 
sion to include or exclude a patient is correct based on the 
criteria. 

Classification of a patient’s status should be explained to 
the clinical research coordinator. Definitions of eligible pa- 
tients, missed patients, and excluded patients should be 
provided along with how to identify each of these statuses. 
The patient study identification number for eligible pa- 
tients should be different than the patient study identifica- 
tion number for missed or excluded patients to differenti- 
ate between all three groups. Therefore, the manual of op- 
erations should outline which patient study identification 
number to use depending on the patient's status and what 
form this information should be recorded on. 


Jargon Simplified: Recruitment Package 

A recruitment package consists of any documents that 
must be completed at the time of screening and rando- 
mization. Important forms to include are as follows: 

e Checklist 

e Screening form 

e Randomization form 

e Consent form 

e Instruction sheet on how to perform correctly the 
treatment to which a patient is randomized 
Dictation script 


Jargon Simplified: Dictation Script 

Adictation script is lists all of the information the clinical 
research coordinator needs to complete the surgical case 
report forms, so that when the surgeon dictates details 
of the surgery, they will hopefully use the dictation 
script as a guide and include as many of those items as 
possible. 


Jargon Simplified: Inclusion and Exclusion Criteria 
“Study investigators specify the inclusion criteria to de- 
fine the population who will be eligible for the trial.” Ex- 
clusion criteria are “criteria that render potential sub- 
jects ineligible to participate in a particular trial.” 


To be able to randomize a patient, the clinical research 
coordinator must obtain informed consent and provide 
the patient with an information sheet. The manual of op- 
erations should note in this section that in the event that 
a patient is unable to give informed consent, proxy consent 
may be obtained. 
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Fluid 
Irrigation 
Techniques 
in Patients 


with Open 
Fracture 
Wounds 





Inclusion Criteria 
1. Men or women age 18 or over. 
2. Open fracture of the upper and/or lower extremities or pelvis. 
3. Fracture treated within 24 hours. 
4. Informed consent provided by patient or proxy. 


Exclusion Criteria 
1. Cognitive or language barriers that would limit completion of 
quality of life questionnaires in English. 
2. Anticipated problems with follow-up in the judgment of the 
resident or attending surgeon. 
3. Known allergy to detergent or castile soap ingredient. 
4. Patient has Grade IIIC open fracture. 
5. Traumatic amputation or amputation through the open wound. 
6. Previous wound or bone infections, fractures, or retained 
hardware in the same extremity. 
7. Surgeon refusal to randomize patient. 
8. Previous randomization in this study or a competing study. 
9. Immunosuppressive medication use. 


owe hee ene: meeting» (EH) ba PHD 

























p~ 
24 Hours tcc" 


Randomization System: pecs tendosen aaea 


Before accessing the randomeation system, 8. Have a pen ready. 


$ 
you must: } 
1. Apply inclussonexclusion criteria. To access the randomizason system ečher: f i 
2 Obtain signed patient consent. 1. Cal 905-555-1111. H / 
3. Have the surgeon's agreement to Authorization code S678 i 
randomize. oe pS t 
4. Know the patient's date of birth. R 
5. Know ii patient has had previous hitpwww.clinicaltrial com FLOW. 
wound or bone infections, fractures, or Login Name: site 


retained hardware in the same extremity. Password: site 


Pata aah mime per Petron af) De RMR! SCOTTY © Fe gao Of Oe mat rere ore 
oper tatya Ane W ary Rneenert ecperstors Be patert atl rema Pe GALE merreni tat Pey sam rasy 
— Fluid Irrigation 
Techniques in Patients 

with Open 


Fracture Wounds 





Pioase ensure that the patient meets the following inclusion/exciusion criteria, 


Inclusion Criteria: 

1, Men of women age 18 of over. 

2. Open fracture of the upper and/or lower extremities or pelvis 

3. Fracture treated within 24 hours. 

4. informed consent provided by patient or proxy. 

Exclusion Criteria: 

1. Cognitive of language barriers that would limit completion of quality of life 
questionnaires in English, 

2. Anticipated problems with follow-up in the judgment of the resident or attending 




















3. Known allergy to detergents of castile soap ingredients. 

4. Patient has Grade IIIC open fracture. 

5. Traumatic amputation or amputation through the open wound. 

6. wound infections, fractures, or retained hardware in the same extremity. 
7. Surgeon refusal to randomize patent 

8. Previous randomization in this study or a competing study, 

9. immunosuppressive medication use. 


Fig. 41.1 (a) Sample poster. 
(b) Sample pocket card. 
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Jargon Simplified: Proxy Consent 
Proxy consent is when a spouse or immediate family 
member gives consent on behalf of the patient. 


Information about the randomization procedure is also 
important to include in this section. This includes the pos- 
sible treatment allocations, instructions on how to access 
the randomization system, and the information the clinical 
research coordinator must have to randomize the patient. 
Phone numbers or Web site addresses used to access the 
randomization system should be included, as well as con- 
tact information in case difficulties are experienced. Any 
forms the clinical research coordinator must complete at 
the time of randomization must also be mentioned in 
this section. 

Below are sample instructions given to a clinical re- 
search coordinator for an orthopaedic clinical trial. 


Key Concepts: Randomization Instructions 

Before randomizing a patient, you must: 

1. Apply inclusion/exclusion criteria to the patient 
2. Obtain signed patient consent 

3. Have the surgeon’s agreement to randomize 

4. Have the patient’s date of birth 


Documentation of Eligible Patient Status 


One of the main roles of the clinical research coordinator is 
to ensure that the appropriate case report forms are com- 
pleted at the appropriate time. The case report forms used 
for each trial will differ, but typical information the clinical 
research coordinator will need to find is patient contact in- 
formation, baseline characteristics, medications the pa- 
tient is taking throughout the course of the study, and 
other key information related to the study. For each of 
these forms, instructions should be given on what infor- 
mation is required to complete them and where to find 
this information. Information can generally be found 


Interventions 


This section of the manual of operations describes the 
treatment groups that patients can be randomized to. All 
potential intervention groups should be listed, so that 
the clinical research coordinator is aware of all the possible 
treatment groups a patient can be randomized to. It is cru- 
cial that these treatments are performed correctly because 
it is the results of those treatments that the result of the 
trial depends on. Therefore, the manual of operations 
should explain how the treatment interventions are ex- 


5. Know if the patient has had previous wound or 
bone infection, operatively treated fracture, or re- 
tained hardware in the same extremity 

6. Know the most severe grade of fracture you are 
randomizing: I, H, IIA, or IIB 

7. Have an authorization code from the central meth- 
ods center 

8. Have a pen ready 


Jargon Simplified: Stratification 
To stratify patients means to separate patients into dif- 
ferent groups based on a single variable. 


The unit of randomization is the element that you are 
randomizing to a treatment group. For example, if the 
unit of randomization is the patient, then the patient 
will be randomized to receive a certain treatment or if 
the unit of randomization is the fracture, then each frac 
ture that meets the inclusion criteria will be randomized 
to a certain treatment group. 


Jargon Simplified: Authorization Code 
An authorization code is a string of letters and/or num- 
bers that allows a person to access the randomization 


Jargon Simplified: Unit of Randomization 
| system. 


from the surgeon, the patient, or a clinical database. If a pa- 
tient is discovered to be ineligible after randomization has 
already occurred, instructions on how to follow these peo- 
ple for the remainder of the study should be given, if there 
is an intention-to-treat analysis planned. 


Jargon Simplified: Intention-to-Treat Analysis 

An intention-to-treat analysis examines the results of 
the study in terms of which treatment group the pa- 
tients were randomized to as opposed to which treat- 
ment the patients actually received. 


pected to be performed according to the study protocol. 
Any case report forms that record the type of treatment 
used should also be mentioned. This section of the manual 
ofoperations can also be used to instruct participating sur- 
geons if it is a surgical trial, or pharmacists if it is a drug 
study. 

If any particular perioperative care is standardized, then 
the manual of operations should specify this care accord- 
ing to protocol. Again, any case report forms related to 
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perioperative care must be mentioned so that this infor- 
mation can be accurately recorded. It should be noted 


Adverse Events 


The clinical research coordinator is to record all operative 
and nonoperative adverse events that occur over the fol- 
low-up period. If a patient dies, if any adverse events occur 
in the operating room during the initial treatment, or ifany 
adverse events occur during the study, then the case report 


Patient Follow-Up 


The clinical research coordinator is responsible for attend- 
ing patient follow-up visits and collecting the appropriate 
information. The typical follow-up time period for a clini- 
cal study is 1 year. During the study follow-up period, there 
will be clinic appointments at specific time points that 
both the patient and the clinical research coordinator are 
expected to attend. These time points should be clearly 
listed in the manual of operations, so that the clinical re- 
search coordinator knows when the patients are to be 
scheduled back at the clinic. This section should clearly 
specify the information that the clinical research coordina- 
tor is expected to document at each follow-up visit. A re- 
minder should also be included in this section that states 
that it is the clinical research coordinator’s responsibility 
to ensure that each patient has pre-arranged appoint- 
ments in the attending doctor’s clinic that fall within the 
trial’s time points. Clinical research coordinators are to 
call patients a few days before their appointments to re- 
mind them of their important follow-up assessment. 

A sample patient reminder form is given in Fig. 41.2. 
This is helpful for patients as well as the clinical research 
coordinator to remember the target dates for each fol- 
low-up visit. 


Outcomes 


The manual of operations should outline the primary out- 
come of the study, including at which follow-up time 
points the primary outcome will be assessed, how to iden- 
tify the primary outcome, and any forms used to document 
the primary outcome. Secondary outcomes should also be 
outlined in the same manner as the primary outcome. 


Adjudication 
Primary outcomes, secondary outcomes, and adverse 


events may be adjudicated by an independent outcomes 
adjudication committee (refer to Chapter 38). This com- 
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that any violations in the treatment or perioperative proto- 
col must be recorded on the protocol deviation form. 


forms that consequently must be completed by the clinical 
research coordinator should be listed here. A reminder 
should be made in this section for the clinical research co- 
ordinator to immediately report any adverse events to the 
local institutional review board or research ethics board. 


mittee will review all documentation and may request ad- 
ditional relevant information from the clinical sites if ne- 
cessary. Therefore, the clinical research coordinator must 
ensure that they have all the information required for the 
committee to be able to adjudicate cases. Necessary docu- 
mentation the committee may need includes case report 
forms, radiograph reports, operative notes, surgical con- 
sultation notes, clinic follow-up notes, and digital photo- 
graphs of specific radiographs, depending on what is being 
adjudicated. It is important that this is mentioned in the 
manual of operations so that the clinical research coordi- 
nator can keep this in mind when collecting information. 


Patient Health-Related Quality of Life 
and Outcomes 


Any questionnaires that are to be administered to patients 
ought to be listed in the follow-up section with a brief de- 
scription of what each questionnaire measures, how many 
items each questionnaire is composed of, and how long 
each questionnaire takes to administer. Instructions on 
how to properly administer a questionnaire should be gi- 
ven, as it is the clinical research coordinator who will be 
administering them. General tips to include are to never 
help the patient come up with an answer, to let the patient 
know that there is no right or wrong answer, to be neutral 
in your response to patient’s answers, and to watch for in- 
consistencies in patient’s answers. For example, if in one 
question a patient answers that their health is excellent, 
yet indicates in another question that their health limits 
them a lot when doing moderate activities, then the clini- 
cal research coordinator should verify these answers with 
the patient by restating the patient’s answers to the con- 
flicting questions and then asking if these answers are cor- 
rect. 
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FLOW STUDY! 


Patient Name, Hospital ID: John Smith, 123456789 


Date of enrolment: August 1, 2006 





John, you are enrolled in the 


(Fluid Irrigation Techniques in Patients with Open Fracture Wounds) 





PLEASE RETURN FOR FOLLOW-UP AND X-RAY WITH DR. BONES @ THE FRACTURE 
CLINIC AT THE GENERAL HOSPITAL AS CLOSE TO THE FOLLOWING DAYS AS 



























































POSSIBLE. 
Follow-up Weeks from Target Date 
Last Appt. ml 

2-wk post-surgery 2 weeks August 15, 2006 £ 
6-wk post-surgery 4 weeks September 12, 2006 

3-mth post-surgery 7 weeks October 31, 2006 

6-mth post-surgery 13 weeks January 30, 2007 

9-mth post-surgery 13 weeks May 1, 2007 

12-mth post-surgery 13 weeks July 31, 2007 




















FOLLOWING DATES: 


Sept. 5/06 Sept. 12/06 





PLEASE MAKE YOUR APPOINTMENT FOR ONE OF THE 


Sept. 19/06 


“If you have any questions or concerns please contact Dr. Bones office at (905) 555-1234. 





Fig. 41.2 Sample patient reminder form. 





Key Concepts: General Tips for Administering Quality 
of Life Questionnaires 

General tips to include in a manual of operations for 
properly administering quality of life questionnaires 
are as follows: 

e Never help the patient come up with an answer. 

e Let the patient know that there is no right or wrong 
answer. 

Be neutral in your response to patients’ answers. 

e Watch for inconsistencies in patients’ answers. For ex- 
ample, if in one question a patient answers that their 
health is excellent, yet indicates in another question 
that their health limits them a lot when doing moder- 
ate activities, then the clinical research coordinator 
should verify these answers. 

Verify inconsistent answers by restating the patient’s 
answers to the conflicting questions and then asking 
if these answers are correct. 


Preventing Loss to Follow-Up 


Unfortunately, even the best-designed clinical trials are 
subject to losing patients.’ Therefore, the manual of opera- 
tions should give instructions to the clinical research coor- 
dinator on how to proceed in the event that a patient 
misses a follow-up visit. The clinical research coordinator 
should phone the patient immediately and reschedule 
the appointment to fall within the acceptable time points. 
Classification of acceptable ranges for each designated fol- 
low-up visit should be listed, so that the clinical research 
coordinator will know whether the patient can still be re- 
scheduled for a follow-up appointment that will fit within 
the acceptable timeframe or whether any rescheduling 
will fall outside of the acceptable timeframe. Conse- 
quently, the follow-up would have to be officially docu- 
mented as a missed follow-up. For example, 6-months 
postsurgery is one of the study’s time points and the ac- 
ceptable range for this visit is 5- to 7-months postsurgery. 
If the patient misses his or her 6-month postsurgery fol- 
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low-up visit and reschedules for 7-months postsurgery, 
then that appointment would count as the 6-month fol- 
low-up appointment because it falls within the acceptable 
range. However, if the patient reschedules for 8-months 
postsurgery, then this would fall outside of the acceptable 
timeframe and the patient’s 6-month follow-up would be 
counted as missed. 

When patients are unable to reschedule their appoint- 
ments in the confines of the acceptable time ranges for 
each follow-up period, it is the clinical research coordina- 
tor’s job to collect as much information as possible by tele- 
phone for the specified follow-up period. Any question- 
naires or questions that can be answered by the patient 
should be asked over the telephone. 


Protocol Deviations 


In the event that a protocol deviation occurs at any stage of 
administering the study methodology, a protocol devia- 
tion form is generally filled out to track these deviations. 
Any other forms the clinical research coordinator must 


Trial Administration 


The completion of case report forms is vital in a clinical 
trial. The manual of operations can be used as an instruc 
tion manual on how the clinical research coordinator can 
best complete these forms. Instructions on how to fill out 
the forms neatly and consistently, and how a patient 
should correctly change their answer if desired are useful 
items to include. When filling out forms it is important 
that there are no patient identifiers on them and that all 
identification numbers, patient’s initials, visit numbers (if 
required), and date of visit are clearly entered in the corre- 
sponding places on the case report forms. 

The clinical research coordinator will also need to know 
how to return the completed case report forms to the cen- 
tral methods center. This section in the manual of opera- 
tions should list all forms with instructions on when and 
how they should be sent back to the central methods cen- 
ter. Typically, case report forms should be sent no later 
than 7 days from their completion. 

A clinical trial will have several trial committees that are 
formed to overlook different aspects of the trial. The clini- 
cal research coordinator should be aware of these commit- 
tees, as they will be responsible for providing information 
to each of them. 
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If a patient lives in another city, state, or country, the 
clinical research coordinator must identify from the local 
attending physician, the physician to whom the care of 
the patient has been transferred. The clinical research co- 
ordinator can then send the appropriate forms to be filled 
out by the receiving physician. 


Final Follow-Up Visit 


Any examinations, forms, or questionnaires that are speci- 
fic to the final follow-up visit should be listed in this sec- 
tion as well. 


fill out for these deviations should be mentioned. The 
manual of operations should note that in light of a protocol 
deviation, follow-up visits will be continued as usual in the 
same way as other randomized patients in the study. 


Key Concepts: Examples of Possible Trial Committees 

e A steering committee oversees all aspects of the trial 
and all trial committees. 

e An adjudication committee reviews each event that is 
reported by an investigator during the trial and deci- 
des whether it is to be regarded as a study event. 

e A data monitoring and safety committee consists of 
members who are not involved in or associated with 
the trial, whose responsibility is to ensure the safety 
of the patients enrolled in the trial. 


The study coordinator of the central methods center will 
likely be in charge of data management for all centers. 
This means that they will have to prepare quality control 
reports for each center. These quality control reports are 
important for providing other sites with updates and re- 
minders of tasks still yet to be completed. The manual of 
operations should specify the information that will be in- 
cluded in a quality control report and how often these 
reports are to be faxed to each site. 
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Center Reimbursement 


Another task of the clinical research coordinator is to keep 
track of center reimbursement. Center reimbursement is 
the payment provided to each participating center for their 
efforts on behalf of the trial. Typically, participating centers 
are compensated for the completion of study case report 
forms. The manual of operations should specify the total 
amount of money allocated to each participating site and 
then provide a breakdown of amounts given for the com- 
pletion of each form. The payment schedule should also 
be included, so that the clinical research coordinator can 
keep up-to-date on center reimbursements. The form of 
payment should also be noted. 

Table 41.1 shows an example of a payment schedule per 
patient for a center participating in a clinical trial. 


Troubleshooting 


It is inevitable that the Clinical Research Coordinator will 
come across problems or uncertainties in a clinical trial. 
Although there are main contacts from the Central Meth- 
ods Center who can be contacted to resolve these pro- 
blems, it is good to look ahead and solve potential pro- 
blems before they occur. A good Manual of Operations 
will cover topics that have the potential to cause a pro- 
blem. For example, problems with the randomization 
line, ambiguity in the inclusion and exclusion criteria, pa- 
tients with long hospital stays, or ineligibility discovered 
after randomization are all important problems to discuss. 
A good rule of thumb is to include any information that 
may be the slightest bit confusing. If there is any aspect 
or process of the trial that has the potential to be mislead- 


Appendix 


The appendix is a good place to put any figures, tables, or 
charts that will assist in the understanding of the Clinical 
Research Coordinator. Important items to include in the 


Table 41.1 Sample Payment Schedule per Patient* 


Payment 
Amount 


$300 


Responsibilities 


Expenses related to the case, when a patient is 
randomized, and the following forms are completed 
and faxed to the methods center: 


- Screening form 

- Randomization form 

- Patient contact form 

- Baseline characteristics form 
- Surgical report form 

- Hospital discharge form 





$500 For each patient’s 6-month follow-up after the 
completion of the 1-week, 2-week, 6-week, 3-month, 


and 6-month follow-ups 





$200 For each patient’s 12-month follow-up after the 


completion of the 9-month and 12-month follow-ups 


* Based upon each center receiving $1000 per patient enrolled in 
the trial. 


ing or to be a source of error, it is best to give clear instruc 
tions to the Clinical Research Coordinator to ensure full un- 
derstanding of these concepts. Since it is not always appar- 
ent which aspects will be troublesome, trouble shooting 
instructions can be added as the trial progresses. 


Key Concepts: Troubleshooting in a Manual of 
Operations 

A good rule of thumb is to include any information that 
may be the slightest bit confusing. If there is any aspect 
or process of the trial that has the potential to be mis- 
leading or to be a source of error, it is best to give clear 
instructions to the Clinical Research Coordinator to en- 
sure full understanding of these concepts. 


appendices are randomization information such as user- 
names and passwords, and flow charts of the patient fol- 
low-up time points. 
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Conclusions 


A Manual of Operations is a vital document to create for a 
clinical trial as it helps the Clinical Research Coordinator 
ensure that the trial is performed as smoothly as possible. 
There are many important details of a clinical trial that a 
Clinical Research Coordinator must have extensive knowl- 
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edge of; therefore, having a manual that includes all of this 
information is of great use. Every trial is unique, but using 
the twelve standard topics covered in this chapter as a ba- 
sis for making a Manual of Operations will assist a trial in 
achieving success. 
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Review of Basic Statistical Principles 


“The aim of statistical analysis is to use the information gained from a sample of indivi- 
duals to make inferences about the relevant population.” 


Summary 


The essential concepts to understanding the conduct and 
interpretation of statistical analysis are described in this 
chapter. The fundamental components of statistical analy- 
sis, the data, how they are classified, and basic ways of de- 
scribing or summarizing the data are discussed. Making 


Introduction 


An understanding of basic statistical principles is funda- 
mental to both conducting clinical research and properly 
interpreting results, whether they are your own or those 
published by others. As clinicians and researchers, we are 
inundated with claims of association of risk factors for cer- 
tain diseases and assertions of benefit of certain treat- 
ments for a given condition. How can one sort out reason- 
able assertions from those that are untenable given the 
data presented? Statistics are essential to rationally inter- 
preting data and determining what constitutes useful evi- 
dence. 


Key Concepts: The Purpose of Statistics 

Statistical analysis cannot prove anything. The purpose 
of statistics can more accurately be described as putting 
limits on our uncertainty with respect to a research 
question of interest for which data are gathered. 


When analyzing health data for research purposes, the in- 
tention is to extrapolate the findings from a (relatively 
small) sample of individuals on whom we can afford to col- 
lect data to a (much) larger population of similar indivi- 
duals (on whom we wish we could afford to collect data). 
Study subjects act as proxies for the “target population,” 
i.e., the total group of interest. Statistical analysis relies 
on the assumption that these subjects are a representative 
sample of the target population of interest. In comparative 
studies, such as those that compare two treatments, the 
comparator groups must come from the same target popu- 
lation and be as similar as possible except for the treatment 


— Altman, 1991 


correct inferences about the data depends on understand- 
ing their variability so basic ideas about how data are dis- 
tributed are presented. Finally, the two fundamental sta- 
tistical approaches, estimation and hypothesis testing are 
introduced and contrasted. 


or risk factor of interest to minimize bias. No statistical 
technique can salvage valid results from data of poor qual- 
ity or data with unknown or unmeasured sources of bias. 
Addressing bias and data quality are important considera- 
tions in study design and conduct (discussed elsewhere in 
this text) and are essential to the validity of statistical ana- 
lysis. 


Jargon Simplified: Samples and Populations 

Target population: The group of patients that the re- 
searcher wishes to describe or to which she or he hopes 
to apply research findings, for example, elderly women 
in the United States. 

Study sample: The subjects that are drawn from the tar- 
get population for study and from which statistical infer- 
ence is derived, for example, a random sample of 1000 
elderly women selected from all 50 U.S. states. 


Learning statistics can be challenging to researchers and 
readers of the literature alike. The field of statistics, or bios- 
tatistics as it is commonly referred to when applied to the 
analysis health research, is a strange mixture of mathe- 
matics, logic, and judgment. Still, the basic necessities are 
readily accessible to most clinicians and will be described 
here in simple terms. All examples presented were derived 
for heuristic purposes and do not represent actual data un- 
less a cited reference is provided. Analyses were generated 
and analyzed using STATA version 9.2 statistical software 
(StataCorp, College Station, TX). 
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Classifying Types of Data 


Data are the basic ingredients of a statistical analysis. One 
must understand their form to appropriately describe 
them and any relationships that lie therein. Data describe 
various characteristics of an observation (a patient, subject, 
case, or incident) and come in two main forms: categorical 
or numerical. The basic components are often used to cal- 
culate the frequency of occurrence of a particular event or 
outcome in the form ofa rate or proportion. These types of 
data are described below. 


Categorical Data 


Also known as qualitative data, these data are described as 
having two or more categories. The simplest form of cate- 
gorical data has two categories only and typically indicates 
the presence or absence of some attribute. Examples in- 
clude: 

Male/female 

Dead/alive 

Diabetic/nondiabetic 

Fracture union/fracture nonunion 

Implant survived/implant failed 

Smoker/nonsmoker 


De we NS 


These data are often called binary. Note that although gen- 
der and mortality are unequivocal categories, being classi- 
fied as a smoker, diabetic, or having united a fracture relies 
on some more subjective cutoff point (number of cigar- 
ettes per day or number of cortices bridged), which re- 
quires a simplification of more complex data. Although 
the latter process is less favorable statistically because it in- 
volves a loss of information, it is often necessary for ease of 
interpretation. For example, it may be more desirable clini- 
cally to report the impact of a particular type of surgery on 
fracture union rates than number of radiographically 
bridged cortices even though the latter may be more quan- 
titatively accurate. 

When categorical data have more than two classes, they 
can either have some sort of natural ordering or not. Blood 
type (A/B/AB/O) or injury type (blunt/penetrating/burn) 
are two examples of data that lack a natural ordering or 
progression from one to the next category and are called 
nominal data. Various staging classification such as for can- 
cer or grade of open fracture have an obvious ordering of 
categories and are referred to as ordinal data. 


Jargon Simplified: Rates and Proportions 
Rates and proportions are distinct measures — often of 
disease - that are useful for describing and comparing 


data. Rates are a measure derived from taking the fre- 
quency or count of events divided by some measure of 
interval time. For example, the number of infections 
that occur in one year would be an infection rate. Propor- 
tions or percentages are unitless ratios of two quantities. 
This could be a quantity such as left ventricular ejection 
fraction or the number of patients out of a group of 100 
that develop a complication after treatment with a cer- 
tain drug. Understanding the difference between rates 
and proportions is crucial to their appropriate use as 
variables and outcome measures of disease. 


Numerical Data 


Numerical data differ from categorical data in that they are 
quantitative. They come in two basic forms, discrete and 
continuous. Discrete data come about when a variable of 
interest can only take on a certain numerical value. It 
may be a certain number of visits to ones doctor or the 
number of times a patient asks for pain medication. These 
are typically counts of events and differ from ordinal cate- 
gorical data in that there is some additivity and propor- 
tionality to discrete data. Four visits to one’s doctor are 
twice as many or two more than two visits to the doctor. 
Stage IV cancer is not necessarily twice as “bad” as stage 
II cancer, and the clinical difference between stage II and 
III is not necessarily the same as going from stage III to IV. 
Continuous data are usually obtained by some form of 
measurement and are not limited to any value other 
than by the precision of the instrument. Height, weight, 
and blood pressure are all continuous data that can be re- 
ported to as many decimal places or fractions of a milli- 
meter of mercury as are provided by the measurement de- 
vice. Even though continuous data are often lumped into 
categories (i.e., ages 1 to 5, 6 to 15, 16 to 25, and so on), 
the richness of information is usually optimized through 
leaving them in their native form for statistical analysis. 


Key Concepts: The Primary Classes of Data 
e Categorical 

- Binary 

- Nominal 

- Ordinal 
e Numerical 

- Discrete 

- Continuous 
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Descriptive Statistics 


Now that the basic forms of data have been introduced, one 
needs a way to begin to describe and summarize them. De- 
scriptions of data generally aim to provide some informa- 
tion about the variability of data and can be done both gra- 
phically and quantitatively as an important early step in 
analysis. Let us first consider some common graphical re- 
presentations. When categorical variables or events are 
counted, such as the number of hip fractures admitted to 
hospitals in a small town over a year’s time, the bar dia- 
gram is an excellent form of illustration (Fig. 42.1). For nu- 
merical data, “measures of central tendency” are used such 
as means (averages), medians, modes, and frequency dis- 
tributions. Numerical data, such as the age of patients un- 
dergoing primary total hip arthroplasty, can be illustrated 
using another graphic called a histogram (Fig. 42.2). 
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Fig. 42.1 Bar graph representing the number of patients with a 
hip fracture admitted to community hospitals in 2002, by gender 
and day of the week. 
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Fig. 42.2 Histogram showing the distribution of patient ages at 
the time of total hip replacement in 500 randomly selected pa- 
tients from a hospital total joint registry. The y-axis indicates the 
number of patients falling within any given interval of age. 
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The variability of data can be quantified in numerous 
ways. The simplest way to describe the spread of a dataset 
is to report the highest and lowest values (called the range) 
which in the example illustrated in Fig. 42.2 would be 41.3 
years and 89.6 years, respectively. The range tells us about 
the most extreme values of a set of data, but very little 
about how those data are distributed. Centiles are a value 
below which a given percentage of the values occur (exam- 
ple 5 to 95th percentiles). One of the most common uses of 
centiles is the interquartile range (the difference between 
the 25" and 75" centiles). The interquartile range as well 
as the median, 2% percentile and 97% percentile values 
are nicely visualized using the semiquantitative box-and- 
whisker plot (Fig. 42.3). 

Now let us consider more quantitative explanations of 
data. One of the most important concepts in showing the 
variability of data comes from the idea of averaging the dis- 
tance each value is from the mean. A simple mean is calcu- 
lated by adding up numerical values for a series of observa- 
tions (for example, ages of 100 patients undergoing hip ar- 
throplasty) and dividing this sum by the number of obser- 
vations (in this case, 100). Given a mean (m), and an ob- 
served value (x), the distance of that observation from 
the mean is (m - x). If we sum the square power of the dif- 
ferences for (x), we get a positive number that can be aver- 
aged (by dividing by n -1) and we obtain a quantity called 
the variance (V). 


V=(X(m-xy)/(n-1) Eq. 42.1 


The square root of the variance (V) is called the standard 
deviation (SD) and is a fundamental statistical quality of 
data. It is used in many more sophisticated statistical 
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Fig. 42.3 Box-and-whisker plot illustrating data from Figure 42.2 
example. The shaded box represents the interquartile range, the 
whiskers show the distance between the 2% percentile and the 
97% percentile. The dots falling outside of the whiskers are values 
lying outside of the central 95% range. 
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procedures such as regression (see Chapter 44) and analy- 
sis of variance (see Chapter 45). For data that are more or 
less evenly distributed about the mean, the large majority 
of observations (> 95%) will lie within 2 SDs of the mean. 


This shows the descriptive power of the SD as well as the 
fact that this usefulness depends on the distribution of 
the data. Considerations about the distributions of data 
will be considered in greater detail next. 


Samples, Populations, and Probability Distributions 


As discussed earlier, almost all statistical analyses are 
based on the principle that one takes a sample of subjects 
for the purposes of drawing some inference about a larger 
population of similar individuals (Fig. 42.4). It is usually 
impossible to identify all members of such a population 
and even if we could, to study them would be prohibitively 
expensive. Taking a carefully selected sample that is ran- 
domly selected or representative of the population of in- 
terest is a powerful way to achieve the same goal. However, 
going from population to a sample leads to some degree of 
uncertainty or margin of error because we are estimating 
without knowledge of an entire population. Accounting 
for this uncertainty is critical to drawing inference; there- 
fore, we need to acquaint ourselves with the principles of 
probability that help define this uncertainty. 

Many statistical methods use mathematically defined 
probability distributions to calculate the theoretical prob- 
ability of different values being observed. These distribu- 
tions are based on parameters such as a mean and SD. If 
the assumption is made that the observed data are a sam- 
ple from a population with a distribution that has a known 
theoretical form, then it is reasonable to use parameters of 
that distribution (those observed) to calculate probabil- 
ities of different values occurring. This parametric ap- 
proach to statistics is wide-ranging and ubiquitous in med- 
ical research. However, if these distributional assumptions 
are not realistic, then we may end up with results that are 
not valid. When data deviate from a so-called normal pat- 
tern such as when large numbers of data points fall close to 
one extreme or another (Fig. 42.5), one should use non- 
parametric or distribution-free methods.' This decision 
between parametric and nonparametric methods is an im- 
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Fig. 42.4 Relational diagram of target population, study samples, 
and statistical inference. 


portant early step in analysis of one’s data and requires the 
analyst to understand the observed distribution of the 
data. 


Jargon Simplified: Nonparametric Statistics 

Nonparametric statistics refers to statistical models that 
do not involve distributional assumptions. These are 
also called nonparametric or rank methods (because 
they are often based on analysis of ranks rather than ac- 
tual data). Because they do not involve distributional as- 
sumptions, nonparametric methods are most often used 
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Fig. 42.5 (a) Histogram illustrating the ages of 500 patients in a 
community who were evaluated for vertebral compression frac 
tures at a hospital in a given year. Note the higher concentration of 
older subjects than younger subjects. (b) Histogram illustrating 
the length of stay of 500 severely injured polytrauma patients with 
femur fractures treated with intramedullary fixation at a given 
trauma center over a 5-year period. Note that the majority of pa- 
tients are discharged within a month, though there are outliers 
spreading out over an additional 2 months. 
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to analyze data that do not meet the distributional re- 
quirements of parametric methods. Skewed data (see 
examples in Fig. 42.5), scores such as those from a visual 
analogue scale or those consisting of only a few values 
(Apgar scores or stages of disease) are most commonly 
analyzed with these techniques. Although nonpara- 
metric methods carry the benefit of being distribution 
free, there is a disadvantage in that nonparametric esti- 
mates tend to be more suited to hypothesis testing than 
estimation (see below for more on the difference be- 
tween estimation and hypothesis testing). 


Understanding the theoretical properties of various distri- 
butions used in common statistical procedures is beyond 
the scope of this text, however, a general understanding 
of what role these distributions play in statistical analysis 
is necessary. The choice of distribution used depends on 
the type of data. The so-called normal distribution is one 
of the most commonly used distributions and is used for 
statistical analysis of continuous data. Categorical data 
are commonly described by the binomial distribution. 
The Poisson distribution is used to study counts of the oc 
currence of an event such as the number of cases of cancer 
in a town in one year. There are several others, but only the 
normal distribution will be considered in detail here to il- 
lustrate how distributions are used and what their rela- 
tionship is to sample data. 


The Normal Distribution 


The normal distribution is one of the most important prob- 
ability distribution in statistics and is used to describe con- 
tinuous data. It is unimodal (it has a single peak) and sym- 
metric (Fig. 42.6). We use a probability distribution by 
considering the area corresponding to a particular re- 
stricted range of values. Because the total area under the 
normal curve is 1, this area corresponds to the probability 
of those values. For example, the area (or probability of va- 
lues) lying to the left of the mean is 0.5. 

The normal distribution is defined by two parameters, 
the mean and the SD. These two parameters can take on 
any value, but the distribution is always defined by the re- 
lations between the two as illustrated. By calculating how 
many SDs from the mean a value lies, one can easily esti- 
mate what the probability of a value larger or smaller 
would be (Table 42.1). Although these probabilities have 
been tabulated and are easily found in tables available in 
most introductory epidemiology or biostatistical texts, it 
is worthwhile keeping the following commonly used 
ranges in mind. 

Given the distribution of the data illustrated in Fig. 42.2 
on ages of patients undergoing total hip arthroplasty, it is 
reasonable to apply a normal distribution to the age of pa- 
tients undergoing primary total joint arthroplasty. Be- 
cause we know the mean (65 years) and SD (10 years) of 
this group of patients’ ages, we can calculate the likelihood 
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Fig. 42.6 Standard normal distribution (mean = 0, standard de- 
viation = 1) with the height of the curve representing the prob- 
ability density and total area under the curve equaling 1. 


that a patient undergoing arthroplasty would be 45 years 
of age or younger by calculating how many SDs 45 lies 
from the mean value: 


(65-45)/10 = 2.0 


From a table of values of proportions of the standard nor- 
mal distribution above and below this standard normal de- 
viate, it is evident that there is less than a 2.3% chance (half 
of 4.6%, which includes both upper and lower tails of the 
distribution, from table above) that a patient having un- 
dergone total joint replacement at this institution is 45 
years of age or younger. 


Sampling Distributions 


To estimate what may be true for the target population of a 
study, certain assumptions are made such as that of distri- 
bution (normality) and that the sample mean is the same 
as the population mean, because this is the best informa- 
tion available. But how good is the single sample mean as 
an estimate of the population mean? Whether a sample 
is reasonably normal (or described by some other theore- 
tical distribution) can be formally tested, but can usually be 
judged by looking at a graph such as a histogram. With an 
appreciation of what one’s data “look like,” it is possible to 


Table 42.1 Normal Distribution as Defined by the Mean 
and the Standard Deviation 








Range Probability of Being 
Inside Range Outside Range 
Mean + 1 SD 0.683 0.317 
Mean + 2 SD 0.954 0.046 
Mean + 3 SD 0.9973 0.0027 
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justify and use the following concepts to derive inference 
based on sampled data. 

Samples taken from a normal distribution can vary 
based on sample size, but on average, the mean of a sample 
will be the mean of the population. Smaller samples may 
have more variability, even if they come from a normal po- 
pulation. This is illustrated in a simple computer-gener- 
ated experiment in which a 100 random samples were ta- 
ken from a normal population of patients undergoing total 
hip arthroplasty (mean age = 65, SD = 10 years). Three ex- 
periments were performed where the samples size in- 
creased from 10 to 25 to 100 subjects (Fig. 42.7). When lar- 
ger samples are taken, the sample mean will more reliably 
approximate the true population average age. 

Because in real life one cannot feasibly take multiple 
samples from the same population, one needs another ap- 
proach by which to represent this uncertainty that arises 
from sampling. The SD of a sample allows one to make im- 
portant statements about uncertainty when only one sam- 
ple is taken because of several important qualities of 
means of random samples. Also, the expected (or average) 
value of the variance of a sample is the variance of the 
population. The expected value of the SD of the means of 
several samples is: 

olvn Eq. 42.2 
where o is the SD of the variables (VV from Eq. 42.1) in the 
population and n is the size of each sample. This quantity is 
known as standard error of the mean or standard error 
(SE). We can estimate the SE from a single sample using 
the SD in that sample in place of o. 

In medical research, binary outcomes (for example, pre- 
sence or absence of disease) are commonly assessed and 
compared between treatment groups. The distributions 
of such data are described by the binomial distribution 
for a population proportion p. The calculation of the bino- 
mial probabilities is somewhat tedious, and with large 
samples, an approximation to the normal distribution 
can be made. Under this approximation, we expect that re- 
peated samples of the same size (n) will have a normal dis- 
tribution with mean p and SD: 

v(p(1 - p)/n) Eq. 42.3 
The estimate of the SE from a single sample of categorical 
binary data are thus v(p(1 - p)/n). This approximation is 
justified given that the distribution of the sample means 
(or proportions) will be nearly normal whatever the distri- 
bution of the variable in the population as long as the sam- 
ples are large enough (when both p and 1 - p are >5/n). 
Although it is by no means necessary to memorize these 
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Fig. 42.7 (From top to bottom). (a) Demonstrates a normally 
distributed population (A) of patients undergoing total hip ar- 
throplasty with a mean age of 65 and a standard deviation of 10 
years. (b) 100 random samples taken from A of size 10. (c) 100 
random samples taken from A of size 25. (d) 100 random samples 
taken from A of size 100. 


equations for day-to-day analysis, the intuition they con- 
vey is important to understanding how a majority of statis- 
tical analyses give inference to a target population from the 
study sample. 
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Estimation and Hypothesis Testing 


Most statistical analyses that are done fall into one of two 
categories, estimation or hypothesis testing. In medical re- 
search, there is typically a comparison of interest (associa- 
tion, difference, etc.) and the numerical value correspond- 
ing to the comparison of interest is often called the effect. 
Estimation covers a broad range of statistical procedures 
that yield the magnitude of an effect of interest as well as 
the precision of that estimate. Hypothesis testing concerns 
itself with understanding the likelihood of having observed 
a difference or association from data if no such effect exists 
in the population. Although these two concepts may ap- 
pear similar, they actually convey very different informa- 
tion that should be well understood by researchers and 
those who strive to understand reported results in the 
literature. 


Estimation 


Estimating procedures involve the calculation of the size of 
a relationship or difference of interest. Results are com- 
monly reported in terms of a mean difference, relative 
risk, or odds ratio. What is of greater importance and the 
key to inference is the variability or precision of this mea- 
sure. Estimation gives us this quantity, typically in the form 
of a confidence interval (CI)? which informs one of how 
large an error might be made with an estimate of effect. 

The standard error (SE), as previously defined, is a key to 
the process of estimation. It allows one to go from para- 
meters of a sampling distribution to a property of the tar- 
get population because one asserts that the mean or pro- 
portion observed in a sample is the best estimate of the 
true value in the population and the distribution of values 
obtained in several samples would be approximately nor- 
mal for large samples. There are formulas for calculating 
the SEs of comparative measures of effect such as differ- 
ences in means and proportions as well as relative mea- 
sures such as relative risk or odds ratios and the reader is 
referred to the referenced texts for thorough explana- 
tions.*4 

AClis a range of values that we can be confident includes 
the true population value. A CI for the estimated mean ex- 
tends to either side of the mean bya multiple of the SE. For 
example, interpretation of a 95% CI is the range of values 
that contains the true population mean with probability 
0.95. Using the computer-generated experiment example 
from Fig. 42.7 in which random samples were taken 
from a population of patients undergoing total hip arthro- 
plasty, this principle of what a CI is can be illustrated. Re- 
member that the mean of that population is 65 years of 
age. With each of the 100 random samples that are taken, 
a mean, SD, and SE can be calculated. From these, the 95% 
CI for each sample can be estimated. Each of these 100 95% 
Cls are plotted as horizontal bars in Fig. 42.8. Only 7 out of 
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Fig. 42.8 Plot of 95% confidence intervals (horizontal lines) from 
100 random samples taken from a population whose mean is 65 
years. They are stacked from bottom to top in order of the sam- 
ple’s mean value. 


100 samples’ 95% CIs do not contain the population mean 
of 65, nearly the same as that expected given the interpre- 
tation of the 95% Cl as containing the true population mean 
~95% of the time. No matter what distribution is used to 
describe the data, the same principles apply: the SE of 
the estimate is calculated from which one obtains a CI of 
desired width. 


Hypothesis Testing 


Hypothesis testing, though somewhat less intuitive than 
estimation, is used in the majority of reported results of 
statistical analysis in medicine. For hypothesis testing we 
state a null hypothesis that the effect of interest is zero. 
This statistical null hypothesis is often the negation of 
the research hypothesis that generated the data. We also 
have an alternative hypothesis, which is usually simply 
that the effect of interest is not zero. Having set-up our 
null hypothesis, we can then evaluate the probability 
that we could have obtained the observed data (or data 
that were more extreme) if the null hypothesis were 
true. This probability is usually called the P value; the smal- 
ler the P value is, the more unlikely is the null hypothesis. 
We call this process a test because we are deciding 
whether to reject the null hypothesis or not. Notice that 
this value gives no information about the magnitude of in- 
terest. For this and other reasons the approach based on 
estimation and confidence intervals is often considered 
superior.>° 


Jargon Simplified: Hypothesis Testing 
Assume an investigator has the following research ques- 
tion: 
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Does the use of antibiotic irrigation solution change the 
risk of infection in patients with open fractures as com- 
pared with normal saline alone? 

Null hypothesis: The statistical hypothesis that two or 
more population distributions do not differ from one an- 
other and that any differences or effects observed in re- 
sponse to treatment result from chance alone. (Antibiotic 
irrigant does not result in a different incidence of infection 
compared with normal saline.) 

Alternative hypothesis: The alternative hypothesis sug- 
gests that any differences or effects observed in response 
to treatment are a result of the treatment. The alterna- 
tive hypothesis is deferred to if the null hypothesis is re- 
jected given the data and using various forms of hypoth- 
esis-testing procedures. (Antibiotic irrigant does result in 
a different incidence of infection compared with normal 
saline solution.) 


How does one evaluate the probability of obtaining data if 
the null hypothesis is true? The answer lies in calculating a 
test statistic- a value that can be compared with the known 
distribution of what we expect when the null hypothesis is 
true in the population. The general form of the test statistic 
can be expressed in relation to the observed value of the 
quantity of interest and the value expected if the null hy- 
pothesis were true: 


(observed value - hypothesized value) 


Test statistic = 
(standard error of observed value) 





A test statistic is evaluated by calculating the probability 
that we could have observed that value or one that is 
more extreme, if the null hypothesis is true. The test statis- 
tic (which may come from a t test or chi-square analysis to 
be discussed further in Chapter 43) is compared with stan- 
dardized numbers, from a computer program or table 
found in most introductory statistical texts, corresponding 
to specific P values.” There are numerous types of test sta- 
tistics available, but most follow a similar form in convert- 
ing calculations from data into a P value. 

When a P value is below some cutoff (say 0.05), the result 
is often called statistically significant. It is for this reason 
that hypothesis testing is also referred to as significance 
testing. The use of the word significant can lead to much 
confusion over what is statistically significant and what 
is clinically significant. Because medical journals report 
many authors’ results using hypothesis testing, many re- 
strict the use of the word significant to those results that 
meet the statistical definition. The use of cut-points for P 
values leads to treating the analysis as a process of decision 
making (see Jargon Simplified below on type 1 and type 2 
errors made in interpretation of study results). Within this 
framework, it is customary to consider a statistically signif- 
icant effect as a real and nonsignificant result to indicate no 
effect. This is somewhat problematic in that the uncer- 
tainty of the result is somewhat obscured (whereas it is ex- 
plicit when a CI is calculated). It is not reasonable to con- 


clude that a nonsignificant result indicates no clinical ef- 
fect, just because the null hypothesis cannot be ruled 
out. The difference between a P value of 0.045 and 0.055 
should not alter the conclusion about an association in 
the data. P values should be reported in their entirety al- 
lowing readers to make up their minds. Statistical signifi- 
cance in no way implies clinical significance, which re- 
quires clinical knowledge of subject matter and judgment. 


Jargon Simplified: Type I and II Error (a and B Error) and 
The Power of a Study 

Two general classes of error can be made when using the 
Pvalue to make a decision about whether an effect is real 
or not. First, one can obtain a significant result that leads 
to rejection of the null hypothesis when it is actually 
true. This is called type I error and may be thought of 
as a false positive result. Alternatively, one can obtain a 
nonsignificant result leading one to accept the null hy- 
pothesis when it is not true. This is called type II error 
and can be thought of as a false negative result. The 
probabilities of type I and II error are sometimes referred 
to by the Greek letters a and ß, respectively. 


The power of a study is the probability that an associa- 
tion of interest would be found with a given sample 
size if one truly exists in the target population, and is 
expressed as the quantity (1 - f). 


The threshold level at which one may consider a P value 
to be statistically significant also depends on how many 
times one sample group is compared with another. The 
more times that one looks for a difference between two 
groups, the more likely one is to find a difference there 
that has occurred purely by chance (i.e., the type I error 
rate increases). This is an important consideration when 
two groups defined by treatment received are compared 
with respect to multiple outcomes of interest, or multiple 
subgroup analyses are performed. When multiple testing 
occurs, the cutoff P value for statistical significance should 
be lowered accordingly. 

Estimation and hypothesis testing have been introduced 
here as two different statistical approaches to inference 
from study data. They have been contrasted to highlight re- 
lative strengths and weaknesses. In practice, they are com- 
monly presented side-by-side in reports of study results as 
illustrated in Examples from the Literature below. In this 
example, the authors compare the proportion of patients 
who developed femoral nonunion who were treated with 
unreamed intramedullary nailing (A) to that of patients 
treated with reamed intramedullary nailing (B). A P value 
of 0.049 was adequate to reject the authors’ null hypoth- 
esis of no difference in treatment effect between the two 
groups. However, one could not appreciate the magnitude 
of the difference in incidence of nonunion between the 
two treatments without the reported relative risk (A/B = 
4.5). More important, the estimated confidence interval 
indicates that the true population risk of nonunion is be- 
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tween a 1- and 20-fold higher for patients receiving un- 
reamed intramedullary nailing versus those receiving re- 
amed nailing. Although the P value and the CI both suggest 
an effect that is barely statistically significant, it is only 
through the estimation of effect and knowledge of it mar- 
gin for error that one can infer whether that effect may be 
of sufficient size to be of clinical importance. 


Examples from the Literature: The Use of Estimation 
and Hypothesis Testing in the Analysis of a Rando- 
mized Controlled Trial 

Source: Canadian Orthopaedic Trauma Society. Non- 
union following intramedullary nailing of the femur 
with and without reaming. Results of a multicenter ran- 
domized clinical trial. J Bone Joint Surg Am 2003;85-A 
(11):2093-2096. 

Abstract 

Background: Intramedullary nailing of the femur with- 
out reaming of the medullary canal has been advocated 
as a method to reduce marrow embolization to the lungs 
and the rate of infection after open fractures. The use of 
nailing without reaming, however, has been associated 
with lower rates of fracture-healing. The purpose of 
this prospective study was to compare the rate of union 


Statistical Modeling 


Much of statistical analysis (estimation and hypothesis 
testing) is based on modeling. Statistical models are math- 
ematical relationships between two or more variables that 
give an approximate description of the observed data. They 
should not be thought of as explaining underlying mechan- 
isms, but rather as simplifications that are compatible with 
the data and provide us with some inference as to associa- 
tions seen in the data. The most common parametric mod- 
els fall into the framework of general linear models. These 
models are also known as additive in that an observed de- 
pendent variable can be explained by a model in which the 
effects of different influences or independent variables are 
added. An example may be that multiple factors (indepen- 
dent variables) may contribute to the incidence of tibial 
fracture nonunion (dependent variable) such as age, smok- 
ing status, degree of soft tissue injury, and size of cortical 
defect. In most situations, a multivariate approach should 
be taken because in most cases, a bivariate association 
(i.e., between one treatment or risk factor and effect) that 
is not influenced by some other factor is extremely rare. 
Although the results of such multivariate analysis are com- 
monly presented, the details of how the models were se- 
lected are not. Readers may be led to assume the results 
are accurate when they may have been derived using inap- 
propriate models. Assumptions are made when a model is 
fit (such as that of normality of the data discussed earlier) 
and it is important that these assumptions be verified and 
the overall fit of the model to the data be assessed. This is 
determined in terms of the amount of variability in the data 
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of femoral shaft fractures following intramedullary nail- 
ing with and without reaming. 

Methods: Two hundred and twenty-four patients were 
enrolled in a multicenter, prospective, randomized clin- 
ical trial to compare nailing without reaming and nailing 
with reaming. One hundred and six patients with 107 fe- 
moral shaft fractures were treated with a smaller dia- 
meter nail without reaming of the canal, and 118 pa- 
tients with 121 fractures had reaming of the canal and 
insertion of a relatively larger diameter nail. Patients 
were followed at six-week intervals until union occurred 
or a nonunion was diagnosed. 

Results: The two groups were comparable with regard 
to the measured patient and injury characteristics. Eight 
(7.5%) of the 107 fractures in the group without reaming 
had a nonunion compared with two (1.7%) of 121 frac 
tures in the group with reaming (P = 0.049). The relative 
risk of nonunion was 4.5 times greater (95% confidence 
interval = 1 to 20) without reaming and with use of a 
relatively small-diameter nail. 

Conclusion: Intramedullary nailing of femoral shaft frac- 
tures without reaming results in a significantly higher 
rate of nonunion compared with intramedullary nailing 
with reaming. 


explained by the model and how well the model predicts 
individual outcomes for a given observation. These are di- 
agnostic procedures that should be performed by a statis- 
tician or experienced data analyst, but understood by the 
clinical researcher. 


Jargon Simplified: Multivariate Regression 

Most analyses of multiple potential predictors or risk fac 
tors for a particular outcome are analyzed and described 
in terms of so-called dependent (Ys) and independent 
variables (Xs) that constitute a multivariable analysis. 
These terms are somewhat misleading as there is noth- 
ing necessarily dependent or independent about the 
variables and regression is a vague description of the 
mathematics behind what is actually going on. Most 
such analyses are based on the general linear model: 


Y=A+ BX, + BX +... BpXp 


where the expectation (or mean value) of Y is an additive 
combination of an intercept (A) and (p) explanatory in- 
dependent variables multiplied by their respective coef- 
ficients (B, through B,). Each coefficient represents an 
estimate of effect depending on the type of general lin- 
ear model (Examples: mean difference for linear regres- 
sion and log odds ratio for logistic regression). Multivari- 
able analysis allows one to estimate the association be- 
tween dependent and independent variables, control- 
ling for the influence of other independent variables. 
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Conclusions 


In this chapter, the fundamental principles or purposes of 
statistical analysis in clinical research have been discussed. 
Knowing the various forms of data (categorical versus nu- 
merical) is crucial to choosing the right statistical methods 
to accurately describe them. The relationship between a 
study sample and the target population from which it is 
derived has been described and shown to be a key concept 
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Statistical Means and Proportions 


“Statistics: The only science that enables different experts using the same figures to draw 


different conclusions.” 


Summary 


The goal of this chapter is to explore the practical concepts 
of why, when, and how to conduct basic statistical compar- 
isons of data means and data proportions. The commonly 
used t test and chi-square analysis are presented and clin- 


Introduction 


From earlier chapters, you have become familiar with evi- 
dence-based medicine and know what makes a good out- 
come measure and a reliable measuring instrument. You 
know how to design a randomized controlled trial. You 
have collected data and are finally ready to begin the ana- 
lysis. In the past, this meant either spending long hours 
with a calculator manipulating convoluted formulas or de- 
veloping computer code to do the analyses for you. 
Happily, this is not the case anymore. No longer do you 
need to be a statistician to conduct the appropriate ana- 
lyses (although it is always a good idea to consult with a 
statistician if you are unsure about how to analyze your 


Comparing Means and Proportions 
Why the Comparison 


A clinical scenario: you use proximal femoral nails for 
treatment of unstable trochanteric fractures. Your partner 
uses dynamic hip screws. You want to conduct a study to 
determine whether there are differences in the results. 

You would like your results to have clinical significance. 
You would like your results to apply to the general popula- 
tion, but in any clinical trial money and time are limited, so 
you cannot test everyone. Luckily, you do not need to. You 
must merely test it on a sample that you hope is represen- 
tative of the whole population and then extrapolate the re- 
sults appropriately to the population. Statistics help with 
the extrapolation; they indicate how well the sample pre- 
dicts the entire population. No data from a sample set pre- 
dicts the population exactly. There is always some varia- 
tion, some error, and some exceptions. 


— Evan Esar 


ical examples demonstrating the ease with which these 
tests can be conducted using programs such as Microsoft 
Excel (Microsoft, Inc., Redmond, WA) are provided. 


data). A few basic steps if followed properly, will allow 
you to analyze most data. 

Performing the actual statistical test is easy to do using 
programs such as SPSS (SPSS Inc., Chicago, IL), MiniTab 
(MiniTab Inc., State College, PA), Matlab (The MathWorks, 
Inc., Natick, MA), or Excel, which automate the procedures. 
Deciding which statistical test is appropriate may be the 
biggest challenge. This chapter will help to guide you 
through the process of when to use a particular statistical 
test along with examples of how to conduct the actual test. 
The mathematics and theory behind each of the statistical 
tests is outside the scope of this chapter, but can be found 
in the recommended references at the end of chapter. 


By using the appropriate statistical test, you can deter- 
mine whether or not the differences in a particular out- 
come are due to the different treatments (proximal fe- 
moral nail or dynamic hip screw) or are just due to random 
chance. When you pose the research question, you hy- 
pothesize one way or the other: You can hypothesize that 
any differences in outcome are due to the treatments, or 
you can hypothesize that the differences in outcome are 
independent of the treatments (no differences between 
treatments). 


Key Concepts: Statistical Testing 

e Allows sample data to be extrapolated to a population 

e Determines whether differences in outcomes are due 
to treatments or due to random chance 


In our example, we can take our general question and 
turn it into a more specific clinical question that has a 
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measurable outcome: Is there a difference in risk of cutout 
between a proximal femoral nail and a dynamic hip screw 
following treatment of unstable trochanteric fractures? 
Our hypothesis (what a statistician would call an “alterna- 
tive hypothesis”) is that the risk of cutout is different be- 
tween the two treatments. Note that we did not specifically 
hypothesize that one treatment has a higher (or lower) cut- 
out than the other, we just hypothesized that they are dif- 
ferent (could be higher or lower). This may seem like a 
subtle difference, but it changes the details of the analysis, 
so it is important. Note that the hypothesis is suggestive of 
differences, changes, or improvements. 

Once we have defined our specific hypothesis, we also 
need to define a null hypothesis, which is basically the op- 
posite of our hypothesis. Our null hypothesis is that the risk 
of cutout is no different between the two treatments. Note 
that the null hypothesis is suggestive of status quo, no dif- 
ferences, no changes, or no improvements. 


Key Concepts: Hypothesis 

e A hypothesis is suggestive of differences, changes, or 
improvements. 

e Anull hypothesis is suggestive of status quo, no differ- 
ences, no changes, or no improvements. 


The statistical analysis will indicate, with some level of cer- 
tainty, whether the hypothesis is true or the null hypoth- 
esis is true. It will indicate whether any differences in the 
observed cutout are due to the different treatments or 
are merely due to random chance. 

How do we assess and communicate these differences? 
We apply the appropriate statistical test to the data and re- 
port the P value. The P value is defined as the probability 
that differences between treatments are due to chance 
alone.' Thus, the lower the P value, the more likely it is 
that the differences in outcomes are due to the treatments. 
The higher the P value, the more likely it is that the differ- 
ences in outcomes are due to the random chance. When a P 
value of 0.05 is reported, it indicates that there is only a 5% 
chance that differences in outcomes are due to chance 
alone. As an example, if we compared mean time to union 
of tibial shaft fractures using either plating or intramedul- 
lary nailing, a statistical comparison of mean days to heal- 
ing might yield a P value of 0.001. That P value indicates 
that there is only a 0.1% chance that the differences in 
time to healing between the two treatments are due to 
chance, and a 99.9% chance that the difference in time to 
healing is due to the different treatments. A high P value 
generally indicates that the hypothesis is false, whereas a 
low P value indicates that the null hypothesis is false. 


Jargon Simplified: P Value 

The P value is the probability that differences between 
treatment outcomes are due to chance alone. A P value 
of 0.05 indicates a 5% chance that differences between 
treatment outcomes are due to chance. 


In the biomedical field, it is customary to use P< 0.05 as the 
cutoff to designate when something is “statistically signif- 
icantly different.”” So any P value < 0.05 is deemed a signif- 
icant difference, whereas a P > 0.05 is deemed not signifi- 
cantly different. It is worth noting that there is nothing ma- 
gical about a P value of 0.05, and there is little difference 
between a P value of 0.051 and 0.049 even though 0.049 
is deemed significant and 0.051 is not. Typically, if a P value 
is > 0.05, but < 0.10, it is described as “approaching signifi- 
cance.”? For example, if we looked at intraoperative blood 
loss from minimally invasive total hip arthroplasty com- 
pared with a traditional open technique using the appro- 
priate statistical test and the P value was 0.09, we would 
report that the mean blood loss between the two treat- 
ments was not statistically significant, but approached sta- 
tistical significance. 

If P< 0.05, we conclude from our study data that there is 
a Statistically significant difference between treatments. 
However, if P >0.05 this does not necessarily indicate 
that there are no differences between treatments. This 
may seem subtle, but it is not. For example, you may be 
conducting a study comparing the rate of spinal fusion 
after treatment with either interbody cages filled with 
autologous bone or cages filled with a novel synthetic 
bone substitute. A statistical comparison of data from a 
small study might yield P = 0.29. From these data, you 
would conclude that there are no detectible differences be- 
tween treatments (and accept the null hypothesis). How- 
ever, you cannot necessarily conclude from this that the 
treatments are equal. In other words, just because P > 
0.05, this does not necessarily indicate that the treatments 
give you the same outcome. In this example, it would be in- 
appropriate to conclude that the synthetic bone substitute 
works as well as the autologous bone. 

Think of it this way: A murder suspect goes on trial. The 
evidence is sketchy, the bloody glove does not fit well, and 
ultimately, the defendant is found not guilty by the jury. 
Just because the defendant was not proven guilty, this 
does not mean he was proven innocent. Likewise, P > 
0.05 just means that you could not detect differences be- 
tween treatments. It does not mean that the treatments 
are equal. Even though the analysis of your sample data in- 
dicates no differences, if you had conducted the study on 
an entire population, your results may be different. You 
cannot be 100% sure of how your data from a sample of pa- 
tients can be extrapolated to all patients - there is uncer- 
tainty, differences due to chance alone, and inherent er- 
rors. (To determine if they are truly no different, there is 
an additional step, which indicates the power of the test. 
This will be approached later in the chapter.) 

There are two general types of errors that are important 
to understand as they relate to reporting comparisons of 
outcomes: type I error (also known as « error) and type 
Il error (also known as f error). Type I error is the probabil- 
ity that the sample data from your study indicate signifi- 
cant differences between treatments, when in fact the rea- 
lity for the entire population is that there are no significant 
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differences (observed differences are due to chance).* In 
other words, your study data indicate that the alternative 
hypothesis is true for the study sample data, but in reality 
the null hypothesis is true for the general population. This 
does not happen too frequently. 

The type I error is typically defined at the beginning of a 
study and designates how much type I error is acceptable. 
Again, the customary cutoff for biomedical studies is a < 
0.05 or 5%. That means you are willing to accept a 5% 
chance that your study data indicate a significant differ- 
ence in outcomes based treatment, when in reality there 
is none (observed differences are due to chance). Another 
way of stating this is that you are willing to accept a 5% 
chance of a false positive result from your study. 

Type Il error is the probability that the sample data from 
your study indicate no significant differences between 
treatments, when in fact the reality for the entire popula- 
tion is there are significant differences between treat- 
ments.° In other words, your study data indicate that the 
null hypothesis is true for the study sample data, but in 
reality the alternative hypothesis is true for the general 
population. This is a much more common error. 

Again, p error is typically defined at the beginning of the 
study and designates how much type II error is acceptable. 
The customary cutoff for biomedical studies is typically B < 
0.20 or 20%.” Often the type II error is expressed as the 
study power, which is defined as 1-ß and typically is ac 
ceptable if 1-B > 0.80 or 80%. This means that you are will- 
ing to accept a 20% chance that your study data indicate no 
significant difference in outcomes based on treatment (dif- 
ferences are due to chance), when in reality differences are 
due to treatment. Another way of stating this is that you 
are willing to accept a 20% chance of a false negative result 
from your study. 

In the example above, a P > 0.05 indicates that there are 
detectible differences between the autologous bone and 
the synthetic bone substitute. However, if the power of 
the test is lower than 80% (1-B < 0.8), then the conclusion 
is probably an error. Only if P > 0.05 and 1-ß > 0.8 can you 
conclude that the treatments are no different. 

If P > 0.05 and the power is lower than 80% (1-f < 0.8), 
then you can make no statistical conclusions from your 
data. Type I and type II errors are summarized in Table 
43.1. 


Key Concepts: What P Value Indicates 

e If P < 0.05, this indicates significant differences be- 
tween treatment groups. 

e If P> 0.05, this only indicates that no differences could 
be detected, it does not necessarily indicate that the 
treatments are not different. 

e IfP>0.05 and 1-f > 0.8, this indicates that there are no 
differences between the treatments. 

e If P> 0.05 and 1-ß < 0.8, no statistical conclusions can 
be made about the data. 
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Table 43.1 Type I and II Errors in Drawing Conclusions 
from Sample Data 


Population data 





Hypothesis is 
true (differ- 

ences are due 
to treatment) 


Hypothesis is 
false (differences 
are not due to 
treatment) 





Hypothesis is Correct Type | error* 
true (differences conclusion 
are due to from sample 
treatment, data 

Sample low P value) 

Data Hypothesis is Type Il error* Correct 
false (differences conclusion from 
are not due to sample data 
treatment, 
high P value) 


* False positive conclusion. The tolerable probability of this 
occuring is a. 

* False positive conclusion. The tolerable probability of this 
occuring is B. 


Determination of Differences in Outcome Values 


We communicate comparisons of outcomes by reporting P 
values and the power of the test. To calculate these values, 
we apply the appropriate statistical test to the data mean 
values or proportions. The specific test and the way we 
prepare the data for analysis depends on several factors in- 
cluding the type of data, the distribution of the data, the 
number of samples, the number of treatment groups, 
and the study design. For a particular outcome measure, 
the statistical test is used to determine whether differ- 
ences between values for each treatment are significant. 
Essentially, the statistical test is used to determine 
whether the difference in values between treatments is 
different from zero. If the two treatments truly have the 
same outcome, then we would expect the difference be- 
tween outcomes to be zero. If the two treatments really 
do have different effects on outcome, then we expect the 
difference between outcomes to be nonzero. In other 
words, if the outcome is dependent on treatment, then 
the differences between outcomes is nonzero. 
Unfortunately, due to random variations inherent in ex- 
perimental data, the differences between outcomes is 
rarely zero even if two treatments are the same. For exam- 
ple, if range of motion (ROM) is assessed as an outcome fol- 
lowing anterior cruciate ligament reconstruction and the 
only difference between two patient groups is whether 
their treated knee is the left or right, we would expect 
that there would be no differences in outcome between 
the two treatment groups. Thus, the difference between 
mean ROM of left knee patients and ROM of right knee pa- 
tients should be zero. However, the likelihood of it actually 
being zero is very small. Due to random variations, the dif- 
ference in ROM will be nonzero. It will likely be small, but 
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not zero. It is because of these random variations that we 
need to use statistical tests to tell us whether the differ- 
ences in outcome values are really zero hidden by some 
random variations (and the treatment groups are no differ- 
ent), or whether they are truly different. 

Therefore, the statistical tests indicate whether the differ- 
ence between the outcomes of each of the two treatment 
groups is significantly different from zero. They indicate 
whether the difference between the mean outcomes of 
each of the treatment groups is due to random chance or 
due to the treatments themselves. This is sounding a lot 
like the definition of a P value. Statistical tests analyze 
the difference between the outcome values of the treatment 
groups to determine the probability that the difference in 
mean outcome is due to chance only - which is the P value. 

For continuous data, we use the mean outcome values 
for comparison between treatment groups. For categorical 
data, we generally use the proportion of outcome values in 
each category for comparison between treatment groups. 

Recall that continuous data are quantitative and typi- 
cally measure or count something such as bone mineral 
density, blood loss, or time to healing. Categorical data 
can be dichotomous, which has two possible outcomes 
such as work status (back to work, not back to work), reo- 
peration status (reoperation, no reoperation), or mortality 
(alive, dead). Categorical data can also be nominal, which is 
qualitative and has no rank order such as legal status 
(workers compensation, liability suit, no litigation). Cate- 
gorical data can also be ordinal, which is similar to nominal 
except that the categories are ranked such as heterotopic 
ossification (none, some, focal, diffuse) or patient satisfac 
tion (poor, good, excellent). 


Jargon Simplified: Data Types 

e Continuous data are quantitative and measure or 
count something. 

e Categorical data are qualitative or grouped into cate- 
gories and can be either nominal (does not have an or- 
der) or ordinal (has a rank order). 


Test Usage 


As stated earlier, the specific test and the way we prepare 
the data depend on what type of data are being analyzed 
such as continuous or categorical. It is also dependent on 
the number of treatments to be compared. Two is the mini- 
mum number of treatments that can be compared and the 
analysis for two treatment groups is different than the 
methods used for three or more groups. For the remainder 
of this chapter, we will focus on comparing means and pro- 
portions of only two treatment groups. The appropriate 
analysis of three or more treatment groups will be ad- 
dressed in a later chapter. 

The independent means Student’s t test is commonly 
used when the data are continuous, there are two inde- 
pendent treatment groups being compared, and the data 


are approximately normally distributed. Independent 
treatment groups are those where the value of the out- 
come in one treatment group is totally independent of 
those in the other treatment group. For example, we could 
compare bone mineral density of the proximal femur after 
6 months of bisphosphonate at 35 mg weekly versus 70 mg 
weekly. The data are continuous and the treatment groups 
are independent of each other. Likewise, we could compare 
knee ROM following patellar tendon transfer versus ham- 
string for anterior cruciate ligament repair. Again, the data 
are continuous and the treatment groups are independent. 
For both of these examples, an independent means t test is 
appropriate for comparison. This test specifically assesses 
whether the difference between the mean outcomes of 
each of the two treatment groups is significantly different 
from zero. This then indicates whether the difference be- 
tween the mean outcomes of each of the two treatment 
groups is due to random chance or is due to the treatments. 
From the t test, the probability that the difference in mean 
outcome is due to chance only is reported, which is the P 
value. 

Note that the study design, specifically the hypothesis, 
dictates how the t test is actually conducted. It can be a 
two-tailed or one-tailed test. A two-tailed test is used 
when you do not specifically hypothesize that one treat- 
ment has a higher (or lower) value than the other, you 
merely hypothesize that the two treatments are different 
(could be either higher or lower). A one-tailed is used 
when you specifically hypothesize that one treatment 
has higher (or lower) value than the other such as compar- 
ing bisphosphonate treated patients’ bone mineral density 
to those that are untreated.’ You may specifically hypothe- 
size that the treated patients have a higher bone mineral 
density than those untreated and as such would use a 
one-tailed t test. 


Jargon Simplified: One- and Two-Tailed Tests 

e A one-tailed test is appropriate when the hypothesis 
specifically states that one treatment is higher (or 
lower) than the other. 

e A two-tailed test is appropriate when the hypothesis 
merely states there is a difference between treatments 
that could be higher or lower. 


The paired t test is commonly used when the data are con- 
tinuous, there are two dependent treatment groups being 
compared, and the data are approximately normally dis- 
tributed.’ Dependent treatment groups are those where 
the value of the outcome in one treatment group is depen- 
dent on those of the other treatment group. We can use si- 
milar examples as above to illustrate the use of a paired t 
test. We could compare bone mineral density of the prox- 
imal femur after 6 months of bisphosphonate to the pre- 
treatment bone mineral density value to determine if 
there is an increase in bone mineral density. The data are 
continuous and the two outcomes (bone mineral density) 
are dependent because the final density is somewhat de- 
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pendent on the initial density. Likewise, we could compare 
knee ROM following patellar tendon transfer for anterior 
cruciate ligament repair of the treated knee to the contral- 
ateral knee. Again, the data are continuous and the two 
outcomes (ROM) are dependent because the range of mo- 
tion of the treated knee is somewhat dependent on the in- 
herent (contralateral) ROM of the knee. This test specifi- 
cally assesses whether the mean pairwise difference in 
outcomes is significantly different from zero. This then in- 
dicates whether the pairwise differences are due to ran- 
dom chance or are due to the treatments. From the paired 
t test, the probability that the pairwise difference is due to 
chance only is reported, which is the P value. Paired tests 
are powerful in that they can normalize the data and elim- 
inate much intersubject variability. 

The paired t test can also be one- or two-tailed depend- 
ing on the hypothesis. In our examples, it may be appropri- 
ate to conduct a one-tailed t test for the bone mineral den- 
sity comparison with the hypothesis being that the treat- 
ment increases density, whereas it may be most appropri- 
ate to conduct a two-tailed paired t test for the ROM com- 
parison with the hypothesis being that reconstruction 
changes (increases or decreases) ROM. 

For comparison of categorical data, we use a different 
type of analysis than the t test. There is no mean value 
for a set of categorical data, thus tests based on mean va- 
lues (such as the t test) are inappropriate. Note that catego- 
rical data should not arbitrarily be assigned numeric values 
and then analyzed as continuous data. In other words, if 
you were measuring status of fracture union (nonunion, 
protracted union, or union) at 12 weeks postoperatively 
and comparing plating versus nailing, it is not appropriate 
to assign all nonunions a value of 1, all protracted unions a 
value of 2, and unions a value of 3 to carry out a t-test ana- 
lysis. This potentially will give you erroneous results. 

For comparison of categorical data, a chi-square (x?) ana- 
lysis is appropriate.° The x? analysis is commonly used 
when the data are categorical and there are at least five ob- 
servations in each category. The analysis is based on the 
distribution of frequencies in each category, or the propor- 
tion of observations within each category. For example, we 
could compare rate of infection (infection or no infection) 
of open tibial shaft fractures associated with systemic anti- 
biotic administration alone compared with systemic plus 
local antibiotic (bead) administration. The data are catego- 
rical (and dichotomous). Likewise, we could compare bod- 
ily pain on an SF-36 health survey following either ham- 
string or patellar tendon transfer for an anterior cruciate li- 
gament (ACL) reconstruction. The data are categorical (and 
ordinal). For both of these examples, a y analysis is appro- 
priate for comparison. This test specifically assesses 
whether the distribution (or proportion) of frequencies 
in each category is due to the treatments or merely due 
to random chance. In other words, the analysis determines 
whether or not the proportion of outcomes in each cate- 
gory is dependent on the treatment or whether it is just 
due to chance. From the x? analysis, the probability that 
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the distribution (or proportion) is due to random chance 
only is reported, which is the P value. 


Key Concepts: When to Use Which Test 

e Independent means t test is used when the data are 

continuous, there are two independent treatment 

groups being compared, and the data are approxi- 

mately normally distributed. 

Paired t test is used when the data are continuous, 

there are two dependent treatment groups being 

compared, and the data are approximately normally 

distributed. 

e Chi-square test is used when two treatments are being 
tested, the data are categorical, and there are at least 
five observations in each category. 


Conducting a Test 


Once your data are collected and organized, conducting at 
test on continuous data from two treatment groups is very 
simple. For example, Microsoft Excel has the t test as a 
standard function. 

In Fig. 43.1, data are shown for our example where we are 
comparing knee ROM following patellar tendon transfer 
versus hamstring for ACL repair. Again, the data are contin- 
uous and the treatment groups are independent. The data 
for each of the subjects are entered in columns with subjects 
001 through 020 (patellar tendon) being compared with 
subjects 101 through 120 (hamstring). As a first step, the 
data can be “described” by calculating the mean and stan- 
dard deviation of each dataset. This is shown at the bottom 
of each column of data using Excel’s built in functions “aver- 
age” and “stdev.” The syntax is shown in Fig. 43.1. 





Li O E «| SS so =e 


= =a D E F 
Ra ion [° 





120 


[ =average(B5:B24) |Mean 











=stdev(B5:B24) __|Stdev 











Fig. 43.1 Range of motion data for the two treatment groups 
are entered into the Microsoft Excel program in preparation for 
analysis. 
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The t test can then be conducted by highlighting a single 
nearby cell and selecting from the menu Insert, then Func 
tion, then by typing ttest in the dialog box that pops up. 
Another dialog box opens up like the one shown in Fig. 
43.2, which assists in properly conducting the analysis. 
The dialog prompts for four inputs: Array 1, Array 2, Tails, 
and Type. 

Array 1 corresponds to the range of cells that contain the 
data from the first treatment group. In our example, this is 
the patellar tendon group (subjects 001 to 020), whose 
data lies in cells B5 through B24. The data can be entered 
as shown in Fig. 43.3. Array 2 corresponds to the range of 
cells that contain the data from the second treatment 
group, which is hamstring (subjects 101 to 120) in cells 
D5 through D24. As shown in Fig. 43.4, at the Tails prompt, 
a2 can be entered for a two-tailed test (or 1 for a one-tailed 
test depending on the hypothesis). Finally, at the Type 
prompt, Excel needs to know what type of analysis to 





Function Arguments 


D5:D24 











Fig. 43.3 Excel’s TTEST function dialog box prompts for four in- 
puts necessary to conduct the t test. 





Fig. 43.2 Initiating the TTEST 
function opens a dialog box for 
conducting the analysis. 


run: paired t test (corresponding to an entry of 1), equal 
variance independent means t test (corresponding to an 
entry of 2), and unequal variance independent means t 
test (corresponding to an entry of 3). In our example, the 
appropriate entries are shown in Fig. 43.4. After clicking 
on OK, the P value is calculated and displayed in the cell. 
In our example, the P value is 0.0006, indicating significant 
differences in ROM between treatment groups. 

Using Excel, the paired t test is conducted in a similar 
fashion. In Fig. 43.5 data are shown for our example where 
we are comparing knee ROM of the treated knee relative to 
the contralateral knee following patellar tendon transfer 
for ACL repair. As we previously discussed, the appropriate 
analysis for these data are a paired t test. As an additional 
step in the paired t test, a column can be created to calculate 
the pairwise difference between values, as shown in Fig. 





Function Arguments 

















Fig. 43.4 For the anterior cruciate ligament repair example, the 
dialog box is completed as shown. The P value returned from the 
analysis is P = 0.0006, which indicates statistically significant 
differences between treatment groups. 
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Fig. 43.5 Range of motion data for the treated and contralateral 
knee are entered into the Microsoft Excel program in preparation 
for analysis. A third column has been set up to calculate the dif- 
ference between knees for each subject. 


43.5. The t test is conducted using the same function as de- 
scribed in the previous example with the only difference 
being that at the Type prompt, a 1 is entered as shown in 
Fig. 43.6 to indicate that the type of test is a paired t test. 
After clicking on OK, the P value is calculated and displayed 
in the cell. In our example, the P value is 0.0000015 indicat- 
ing highly significant differences in ROM between treat- 
ment groups. The result of the analysis is shown in Fig. 43.7. 
Conducting the x? analysis involves a couple of extra 
steps, but can easily be completed in Excel. In our example 
comparing rate of infection (infection or no infection) of 
open tibial shaft fractures associated with systemic anti- 
biotic administration alone compared with systemic plus 
local antibiotic (bead) administration the data are catego- 
rical (and dichotomous), so a x? analysis is appropriate. 


i E ë 
Function Arguments 





Fig. 43.6 The P value returned from the analysis is P = 0.000015, 
which indicates statistically significant differences between 
treated and contralateral range of motion. 
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Mean 111.5 104.8 67 
Stdev 57 55 4.3 
ne brut «tue — U a O a 


Fig. 43.7 The result of the t test is shown in cell F4 and indicates a 
highly statistically significant difference in range of motion be- 
tween treated and contralateral knee range of motion. 


The first step in the analysis is to compare the expected 
frequencies to the observed frequencies in each category 
using a contingency table. The contingency table is shown 
in Fig. 43.8 with the appropriate values entered. To deter- 
mine the expected frequencies, a simple calculation must 
be used for each combination of treatment and outcome: 


_ (total of outcome) x (total of treatment) 
total observations 





E 


where E is the expected frequency for each combination of 
treatment and outcome. The expected frequencies for our 
example are shown in the lower table in Fig. 43.8. Once the 
contingency table for the observed and expected propor- 
tions is complete, the analysis is simple. 

The x? test can then be conducted by highlighting a sin- 
gle nearby cell and selecting from the menu Insert, then 
Function, then by typing chitest in the dialog box that 
pops up. Another dialog box opens up like the one shown 
in Fig. 43.9, which assists in properly conducting the ana- 
lysis. The dialog prompts for only two inputs: Actual_range 
and Expected_range. 

Actual_range corresponds to the range of cells that con- 
tain data from the observed contingency table. In our ex- 
ample, this lies in cells C2 through D3. The data can be en- 
tered as shown in Fig. 43.9. Expected_range corresponds 
to the range of cells that contain data from the expected 
frequencies table in cells C8 through D9. After clicking on 
OK, the P value is calculated and displayed in the cell. In 
our example, the P value is 0.0515 indicating that the infec- 
tion rate is not dependent on treatment. In other words, 
we cannot accept our hypothesis, and according to our 
data there is no statistically significant difference in inci- 
dence of infection using systemic antibiotics alone com- 
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EJ Microsoft Excel - Example Problems 
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Infection 
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Total 
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Fig. 43.8 The data are organized 
into a contingency table to compare 
rate of infection to antibiotic treat- 
ment. The upper table shows the 
observed frequency distribution of 
the data. The lower table shows the 
calculated expected frequencies 
based on the observed data. 











Function Arguments =! 


| Actual_range|C2:03 F] = (8,2;92,38) 
Expected_range [cs:03] A] = {5,5;95,95} 


= 0.051575875 
Returns the test for independence: the value from the chi-squared distribution for the 
Statistic and the appropriate degrees of freedom. 





Expected_range is the range of data that contains the ratio of the product of row totals 
and column totals to the grand total. 


| Ca] a | 


Fig. 43.9 Excel’s CHITEST function dialog box prompts for two 
inputs necessary to conduct the chi-square analysis. 


pared with systemic antibiotics in conjunction with beads. 
(Keep in mind that we cannot conclude that the treatments 
are equal until calculating the power of the test, 1-f.) 

The preceding analysis is identical to that which is ap- 
propriate for the problem posed at the beginning of the 
chapter. If we are interested in determining whether or 
not there are differences in risk of cutout from using a 
proximal femoral nail versus a dynamic hip screw follow- 
ing treatment of unstable trochanteric fractures, the data 
will be categorical and dichotomous (cutout or no cutout), 
so the x? test is appropriate and can be conducted easily 
using Excel. 

For our final example in this chapter, we will compare 
bodily pain on an SF-36 health survey following either 
hamstring or patellar tendon transfer for ACL reconstruc 
tion. The data are categorical (and ordinal) so again a x? 
analysis is appropriate. We construct the contingency ta- 
bles in the same manner as the previous example, as 
shown in Fig. 43.10. Using Excel’s chitest function, the ana- 
lysis indicates that bodily pain is highly dependent on 
treatment (P < 0.0001) and thus we can accept our hypoth- 
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Fig. 43.10 The data are organized into a contingency table to 
compare SF-36 health survey’s bodily pain to treatment. The 
upper table shows the observed frequency distribution of the 
data. The lower table shows the expected frequencies based on 
the observed data. 


esis that there is a statistically significant difference in bod- 
ily pain following hamstring versus patellar tendon ACL 
reconstruction. 

The t test and x? analysis are relatively simple to conduct 
and they are appropriate for many different datasets and 
data types. However, when comparing more than two dif- 
ferent treatment groups, alternate tests must be used as 
detailed in the chapters that follow. When analyzing data 
from more than two treatment groups, it is not appropriate 
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to conduct multiple t tests on all combinations of treat- 
ment groups to look for differences. This can cause an accu- 
mulation of errors and misleading results.'° 

In the biomedical field, P < 0.05 indicates a statistically 
significant difference. However, this does not necessarily 


Conclusions 


Statistical testing shows us how reliable an extrapopula- 
tion of an observed difference ot the total population 
would be, i.e., whether it is due to chance or genuine differ- 
ences in the population. the P value indicates the chance 
that an observed difference in the study sample does not 
really exist in the population (i.e., a type I error). Statistical 
significance is often accepted when a test yields a P value 
equal to or smaller than 0.05. However, a P value does 
not indicate the probability of observing no difference in 
the study sample, while a difference does exist in the po- 
pulation (i.e., a type II error). The porbability of a type II er- 
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indicate a clinically relevant difference. A critical assess- 
ment of the data is often necessary to determine whether 
or not the statistical analysis indicates something that is 
meaningful and not just satisfies statistical testing. 


ror is seldom reported, but should be kept in mind when 
assessing non-significant differences, since generally a 
test is more likely to yield a tpye II error than a type I error. 
A conventional statistical test for comparing two means 
(of continuous data) is the t test, whereas proportions (of 
categorical data) are mostly tested by a x? test. Choosing 
the right test for your data increases the validity of your 
conclusions. However, a careful interpretation of P values, 
keeping in mind that no observed differences or equality is 
absolutely sure, is mandated, irrespective of which test is 
used. 
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Regression Analysis 


“Wisdom begins in wonder.” 


Summary 


In the following chapter, we explore situations in which 
the dataset is comprised of two or more continuous vari- 
ables. Simple linear regression analysis is a useful approach 
for characterizing the relationship between such variables. 
After a discussion regarding how regression analysis is per- 


Introduction 


Regression is any statistical method where the mean ofone 
or more random variables is predicted based on other mea- 
sured random variables. Simply put, we might want to 
know what factors (or variables) are associated with (or 
predict) the need for revision surgery among patients trea- 
ted surgically for tibial shaft fractures. A regression analysis 
can help us understand what patient factors (i.e., age, gen- 
der, comorbidity, and fracture severity) are associated with 
an increased risk of revision surgery after an initial opera- 
tion. Regression analyses may also help us identify which 
factors are associated with the length of hospital stay. 

In this chapter, we turn our attention to situations in 
which the independent and dependent variables are 
both continuous. You are bound to encounter such situa- 
tions in the medical literature all too frequently. For in- 
stance, a study can assess any of several relationships be- 
tween two continuous variables: the effect of aging on 
bone density, the association between length of surgery 
and resultant blood loss, or the correlation between hours 


— Socrates 


formed, several statistical parameters related to regression 
analysis are introduced that measure the strength of the 
relationship between continuous variables. The chapter 
concludes with a brief discussion of some of the various 
types of regression analysis. 


worked and frequency of mistakes made by residents, to 
name a few. The statistical approaches introduced in the 
previous chapters are less than ideal for dealing with two 
continuous variables. Inevitably, this poses several impor- 
tant questions: What statistical approaches should be used 
to handle such data? How do these tests work? What ex- 
actly are they telling us? This chapter serves to answer 
these key questions. 


Jargon Simplified: Independent and Dependent 
Variable 

Independent variable: A variable that predicts or ex- 
plains another variable. 

Dependent variable: A variable that is a function of, or 
can be explained by, the independent variable. 


A change in the independent variable results in a pre- 
dictable change in the dependent variable. 


Finding a Relationship between the Variables 


Let us suppose we are concerned that medical trainees 
have undue stress. Figure 44.1 compares stress levels to 
hours worked among medical residents, with the depen- 
dent variable being stress levels measured on a scale of 0 
to 100 (0 being no stress and 100 being maximal stress). 
The independent variable, hours worked, is the number 
of hours worked per week. When dealing with two contin- 
uous variables, our goal is to find whether a relationship ex- 
ists between them and how strong that relationship is. In 
other words, if we know the value of one variable, are we 
able to predict what the value of the other variable is? Be- 
fore delving into the math, consider the graphs below. 


By a quick visual check, it is apparent that no relation- 
ship exists between the two variables in Fig. 44.1. The 
data points are scattered horizontally, indicating that a 
change in hours worked does not result in a predictable 
change in a resident’s stress level. In Fig. 44.2, we do in 
fact see a positive correlation — an increase in weekly hours 
worked is accompanied by an increase in levels of stress. 
Thus, on a visual check, we may conclude that a relation- 
ship exists in which the independent variable (hours 
worked) explains some of the change in the dependent 
variable (stress levels). Is the relationship stronger in Fig. 
44.2a or 44.2b? You may be inclined to think that the rela- 
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tionship in Fig. 44.2b is stronger, given that the slope ap- 
pears steeper. However, the two graphs are comprised of 
the same data points, just displayed on two different 
scales. The strength of the relationship between stress le- 
vels and weekly hours worked is indeed the same for 
both graphs. Always be aware of the scaling “tricks” that 
can sometimes be used to make data seem more important 
than they really are.! 


What if we had changed the dataset in Fig. 44.2b so that 
the slope was in fact larger than that in Fig. 44.2a - would it 
now be appropriate to conclude that the relationship be- 
tween the two variables is stronger in Fig. 44.2b? Abso- 
lutely not; it is important to understand that the slope of 
a graph is not a measure of the strength of the relationship 
between two variables. A simple manipulation of the 
scales used to measure either of the variables could alter 
the slope.” For instance, the independent variable mea- 
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sured in hours per week could be converted to minutes per 
week - a systematic change in our data points that would 
alter the slope, but have no effect on the relationship be- 
tween the two variables. 

So how do we measure the strength of a relationship be- 
tween two continuous variables? It is actually quite simple. 
The closer the data points fall in a straight line to form a lin- 
ear relationship, the stronger the relationship between the 
variables.! Consider Fig. 44.3. 

Looking at the two graphs in Fig. 44.3, it is apparent that 
both graphs represent a positive relationship between 
stress levels and hours worked. That is, increase in hours 


Linear Regression Analysis 


Consider the graph below (Fig. 44.4). The data represents 
the resulting blood loss in femoral neck fracture patients 
treated with hemiarthroplasty. 

The value of y (blood loss) for any given value of x (min- 
utes) can be predicted by analyzing the fitted line. This pre- 
dicted value of ¥ (known as y-hat) can be explained by the 
basic formula for a straight line, 


y=mx+b Eq. 44.1 


worked also leads to increases in stress levels. However, 
the points in Fig. 44.3a lie much closer to the fitted line, 
whereas those in Fig. 44.3b are scattered more widely 
above and below the fitted line. In Fig. 44.3a, the fitted 
line (which is a function of x) better captures the variance 
in y. In other words, by knowing the value of x (hours 
worked), we are better able to predict the value of y (stress 
level). 

How exactly is the line of best fit created to capture the 
greatest amount of variance of the data points? The process 
is known as linear regression analysis. 
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Fig. 44.4 The graph plots operation length versus blood loss in 
femoral neck fracture patients treated with hemiarthroplasty and 
depicts the line of best fit. 
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where m is the slope of the fitted line and b is the y-inter- 
cept. However, as the graph depicts, all the points do not lie 
on the fitted line, and the above formula can only predict 
the value of blood loss with given error in most cases. 

The true value of y is expressed by the following for- 
mula: 

y=mx+br+e Eq. 44.2 
The error term, g, explains the variance of the points from 
the fitted line due to factors, other than operation length, 
that influence blood loss. The error term is also known as 
the residual variance.” The value of the error term for 
each point on the above graph is the vertical distance be- 
tween the actual point and the fitted line, as spanned by 
the green line. Figure 44.5 demonstrates the error terms 
associated with the first three data points in Fig. 44.4 
with a magnified view. 

The goal of regression analysis is to find values for the 
slope and y-intercept that will yield a line of best fit that 
explains the variance in y to the highest degree possible. 
Essentially, we need a line that lies as close as possible to 
all the data points, so that the residual variance is mini- 
mized. This introduces the concept of linear regression 
analysis. Formally stated, linear regression analysis is the 
creation of a line of best fit that minimizes the sum of 
squares of all the error terms (sum of squares residual). 
This is also known as least-squares regression analysis. 
The product of this analysis, the fitted line, is known as 
the least-squares regression line.' It is important to note 
that the error terms are not summed directly because 
some are positive values while others are negative, which 
will cancel each other if summed directly. Rather, regres- 
sion analysis sums the square of each error term so that 
all values are positive. 


Jargon Simplified: Linear Regression Analysis and 
Least-Squares Regression Line 

Linear regression analysis: A statistical approach for 
measuring the linear relationship between two continu- 
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Fig. 44.5 The graph plots operation length versus blood loss in 
femoral neck fracture patients treated with hemiarthroplasty and 
depicts the line of best fit. 


ous variables by creating a line of best fit, which best 
captures the variance in the dependent variable. 
Least-squares regression line: The line of best fit that is 
produced by linear regression analysis. It is defined by 
the general formula, y = mx + b. 


Mathematically, linear regression analysis minimizes the 
following expression: 

Sum of squares residual = £ (y - f)? Eq. 44.3 
For Fig. 44.4, the equation of the regression analysis is, y = 
0.8857x + 283.29. Thus, the slope of the fitted line is 0.89, 
indicating that for every additional minute of surgery 
there is an increased blood loss of 0.89 mL. Although the 
y-intercept is 283 mL, it should be understood that this va- 
lue is meaningless - an operation of zero minutes cannot 
correspond with a blood loss of 283 mL. It is important 
to note that although the equation produced by regression 
analysis enables us to project values beyond our given data 
points; it must be done cautiously to ensure that such pro- 
jections are meaningful and truthful.! Although the math 
is not displayed, our regression analysis corresponds 
with a sum of squares residual of 94.3. 


Coefficient of Determination and Correlation Coefficient 


Regression analysis provides values for the slope and y-in- 
tercept, which maximize the variance in y captured by the 
line of best fit. However, we have yet to discuss how to 
measure the magnitude of variance that is actually cap- 
tured by this line. For that, it is essential to introduce the 
coefficient of determination. The coefficient of determina- 
tion, expressed as r°, measures the percentage of variation 
in y that is accounted for by the least-squares regression. It 
can also be viewed as the percentage of total variation in 
the dependent variable that is explained by the indepen- 
dent variable.” 


2o variance of ŷ Eq. 44.4 
variance of y 


or 





= sum of squares regression 
sum of squares regression + sum of squares residual 
Eq. 44.5 


The numerator in Eq. 44.4, the variance of f, is the sum of 
squares of ¥ - y - the difference between the fitted point 
that corresponds to each of the actual data points and 
the mean value for the independent variable. This is a mea- 
sure of the variance that the independent variable y would 
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have if there was no scatter around the fitted line and all 
the variability could be explained by x. Looking at Fig. 
44.4, the difference between each fitted point and the 
mean is captured by the gray line. Summing the squares 
of 9 - Y, is also known as the sum of squares regression, 
as noted in Eq. 44.5. 

The denominator in Eq. 44.4, the variance of y, is the sum 
of squares of f - y plus the sum of squares of y - 9. This is a 
measure of the total variance of the dependent variable, in- 
cluding the variance accounted for by the regression line 
and the unexplained variance that causes scatter around 
the line. In other words, and as indicated in Eq. 44.5, this 
is equivalent to adding the sum of squares regression to 
the sum of squares residual. Going back to Fig. 44.4, this 
would entail summing the squares of both the gray and 
green lines. 


Jargon Simplified: Coefficient of Determination 

The coefficient of determination, r measures the per- 
centage of variation in y that is accounted for by the 
least-squares regression.! 


Although we have elucidated the process of regression 
analysis and the method by which to measure the magni- 
tude of variance captured by the line, we have yet to ex- 
plain how one determines the strength of the relationship 
between two continuous variables. Such a measure is 
highly related to the coefficient of determination. In fact, 
the strength of a relationship is simply found by taking 
the square root of 7’, giving us a value known as the corre- 
lation coefficient, r. 


Statistical Significance 


Before concluding this chapter, it is essential to describe 
how the P value is calculated in regression analysis to de- 
termine if the relationship between the two variables is 
Statistically significant. The P value is determined by carry- 
ing out an F test. This test is based on a signal to noise ratio, 
or, the ratio between the variability in y that is explained 
by x to the variability that is not explained by x. 


Mean squares regression 


- Eq. 44.6 
Mean squares residual 





Before carrying out the F test, both the numerator and de- 
nominator must be converted to their mean squares. 

For the sum of squares regression, the mean square is 
the average value for the difference between the fitted 
points and the mean. Accordingly, for the sum of squares 
residual, the mean square is the average value of the error 
term, £. It is important to note that when calculating mean 


Jargon Simplified: Correlation Coefficient 

The correlation coefficient, r is a measure of both the 
strength and direction of a linear relationship between 
two continuous variables.’ 


The value ofr can range from -1 to 1. The closer the value of 
r is to zero, the weaker the existing relationship between 
the two variables. Conversely, values of r that approach 
-1 or 1 reflect stronger relationships between the vari- 
ables. If the value of r is -1 or 1, the two variables have a 
perfectly linear relationship with all the data points falling 
in a straight line. The sign of r indicates the direction of the 
relationship. Positive values of r indicate a relationship 
with a positive slope - as the values of the independent 
variable increase so do values of the dependent variable. 
Accordingly, the opposite is true for negative values of r- 
as the values of the independent variable increase the va- 
lues of the dependent variable decrease.' 

From the data in Fig. 44.4, we find that the coefficient of 
determination and the correlation coefficient are: 


1 = 0.784 
r = 0.886 


The utility of a scatter plot cannot be overstated when as- 
sessing values of r. The correlation coefficient only de- 
scribes the strength of linear relationships, and does not 
work for nonlinear relationships. Thus, it is imperative to 
asses the overall pattern of the data points to determine 
whether the correlation coefficient can provide any mean- 
ingful measure of the strength of the relationship. 


squares, the sums of squares are not divided by the number 
of data points. Rather, and for reasons beyond the scope of 
this text, the numerator is divided by 1 whereas the de- 
nominator is divided by n - 2, with n being the number 
of data points. After determining the ratio of the mean 
squares, known as the F-test, significance is determined 
by looking up the F value (i.e., the ratio) in a preexisting 
chart.? The F-test results from Fig. 44.4 are provided in 
Table 44.1. 


Table 44.1 F-Test Results 








Model* Sumof df Mean F Significance 
Squares Square 

Regression 343.214 1 343.214 14.561 0.019* 

Residual 94.286 4 23.571 

Total 437.500 5 


* Dependent variable: blood loss; independent variable: duration 
of operation. 
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Additional Types of Regression 


There are several other types of regression that may be 
performed depending on the type of dataset. When deter- 
mining the linear relationship between one dependent 
variable and several independent variables, multiple re- 
gression analysis is performed. Fundamentally, multiple 
regression analysis operates in a similar fashion to simple 
linear regression. Thus, the equation produced by multiple 
regression appears as follows: 


Y= MX, + M>X> + M3X3 teat M,Xn + b Eq. 44.7 


In this equation, n is the number of independent variables 
and each independent variable has a corresponding slope. 
Another commonly encountered type of regression is lo- 
gistic regression. This is performed for situations we have 
not yet considered - when dealing with a categorical de- 
pendent variable and usually a continuous independent 
variable, or several continuous independent variables. 
Although this seems to violate the idea that regression ana- 
lysis is reserved for continuous variables, logistic regres- 
sion finds a clever method to convert the dependent vari- 
able into a probability score, which of course is a continu- 
ous numerical value between 0 and 1. Thus, a relationship 
can be found between the independent variables and the 
probability of an outcome in the dependent variable. 


Conclusions 


When dealing with two continuous variables, the objec- 
tive is to determine whether a relationship exists between 
them. Least-squares regression analysis provides a line of 
best fit that maximizes the variance in the dependent vari- 
able that can be explained by the independent variable. 
The coefficient of determination, r°, measures the fraction 
of the total variation in the dependent variable that is ex- 
plained by the independent variable, whereas the correla- 
tion coefficient, r, provides a measure of the strength and 
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Key Concepts: Regression 

e Linear regression analysis is used to measure the rela- 
tionship between two continuous variables. 

The least-squares regression line has a slope and y-in- 
tercept such that the sum of squares residual (or sum 
of squares of the error term) is minimized. 

° The coefficient of determination, r°, is the fraction of 
variation in y that is explained by x. It can have a value 
between 0 and 1. 

The correlation coefficient, r, measures the strength 
and direction of the relationship between the depen- 
dent and independent variables. It can have a value be- 
tween -1 and 1. Values closer to -1 or 1 indicate stron- 
ger relationships with the data points falling closer to 
a straight line. Positive values of r indicate relation- 
ships with a positive slope, whereas negative values 
represent negatively sloped relationships. 

Multiple regression is utilized to find a linear relation- 
ship between one dependent variable and several in- 
dependent variables. 

Logistic regression measures the relationship be- 
tween a categorical dependent variable and usually a 
continuous independent variable (or several), by con- 
verting the dependent variable to probability scores. 


direction of the relationship. Statistical significance can 
be determined by carrying out an F test. This is a ratio of 
the explained to unexplained variance in the dependent 
variable, as measured by the mean square regression and 
mean square residual, respectively. In addition to simple 
linear regression, multiple regression is utilized in situa- 
tions involving several independent variables and logistic 
regression is used when dealing with a categorical depen- 
dent variable. 
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Analysis of Variance 


“Statistician: A man who believes figures don’t lie, but admits that under analysis some of 
them won't stand up either.” 


Summary 


In the previous chapter, we learned the application of a t 
test to compare the mean of an outcome variable or a re- 
sponse variable between two independent groups. Why 
go further and what happens when we have more than 
two groups? The goal of this chapter is to inform surgeons 


Introduction 


In the field of surgery, surgeons are often interested in re- 
porting the outcomes or responses such as operative blood 
loss, postoperative pain, and change in the range of motion 
of a joint after surgery. These outcomes are more likely re- 
ported as means and we are often interested in comparing 
the means between two or more groups of patients. When 
claims are made in the surgical literature that a mean of an 
outcome is significantly greater or smaller for a new surgi- 
cal technique compared with other techniques, surgeons 
need to consider the validity of the evidence in support 
of the claim. 

The idea of comparing populations to draw a conclusion 
about the similarities or differences has been around for 
ages. In this chapter, a statistical method is introduced to 
help us to decide whether the differences between sample 
means are large enough to be attributed to chance alone. 
The method is known as the analysis of variance (ANOVA) 


One-Way Analysis of Variance 


In general, we are interested in comparing the mean of an 
outcome variable or a response variable of k independent 
patient populations. Let us assume that these patient po- 
pulations are independent and the distribution of the ob- 
servations in each population is normally distributed. Ana- 
lysis of variance is robust to departures from normality, 
but the data in each group should be symmetric.! As its 
name implies, the ANOVA test is dependent on estimates 
of spread or dispersion or variances.” It assumes that the 
variances are equal and estimates a pooled variance. Pool- 
ing gives more weight to group with larger sample size.! 
Because the formal tests for equality of variances between 


— Evan Esar 


about situations in which they wish to test for differences 
among three or more independent means rather than just 
two. The extension of the two-independent samples t test 
to three or more samples is known as the analysis of var- 
iance. 


and it is a quantitative method used to make simultaneous 
comparisons between two or more means. The idea is to 
compare the mean of a continuous dependent variable be- 
tween the different independent groups. The continuous 
variables are measured on a numeric or quantitative scale! 
and are not restricted to taking on certain specified values, 
such as operation time, size of a tumor, or cholesterol level. 
Instead of a t statistic when using a t test, ANOVA uses an F 
statistic and its P value to evaluate if all of several popula- 
tion means are equal. In fact, the F statistic is the square of 
the t statistic. Although various types of ANOVA exist, the 
basic principle of an ANOVA analysis is the same; however, 
the decision must be made between alternative methods 
at the stage of study design. Here we describe the ANOVA 
methods for independent samples and provide a summary 
of key items to consider when conducting an ANOVA ana- 
lysis. 


several groups is not appropriate for an F test as it is for at 
test, we use the following rules. 


Key Concepts: Rules for Examining Variance in ANOVA’ 
e “If the largest standard deviation (square root of var- 
iance) is less than twice the smallest standard devia- 
tion, we can use methods based on the assumption 
of equal variances and results will still be approxi- 
mately normal.” 

“If the largest standard deviation is more than twice 
the smallest standard deviation, we attempt transfor- 
mation. Transformation often makes the group stan- 
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dard deviations as well as the distributions of observa- 
tions in each group more nearly normal.” 

e “If standard deviations are markedly different and 
cannot be made similar by transformation, inference 
requires different approach.” 


Let us illustrate this with an example. Suppose that we 
plan to compare the mean operation time between three 
surgical techniques in a cohort of patients with hyper- 
trophic pyloric stenosis. The 63 patients included in the 
study are recruited from three different surgical centers 
and these patients have undergone one of the following 
surgical techniques: right upper quadrant (RUQ), laparo- 
scopic (LAP), and circumumbilical (UMB). Before combin- 
ing the patients into one single group to conduct the ana- 
lysis, we would like to examine some baseline characteris- 
tics to ensure that the patients from these different centers 
are in fact comparable. One characteristic that we would 
like to examine is age. If patients from one center are con- 
siderably older - or younger - than those from other cen- 
ters, the results of the study would be biased. Therefore, 
we would like to test the null hypothesis that the mean 
age of patients in all three surgical centers is identical. 
The alternative hypothesis would be that at least one of 
the population means differs from one of the others.” To 
carry on with our analysis, we need to estimate the 
mean, standard deviation, and variance of age for each sur- 
gical center. Figure 45.1 shows the 95% confidence interval 
(CI) for the mean age for each surgical center. From this fig- 
ure, a substantial amount of overlap of CIs suggests no dif- 
ferences of the mean age between the three surgical cen- 
ters, but we would like to perform a formal analysis. Pre- 
sented with the summary data in Table 45.1, the ratio of 
largest standard deviation to smallest standard deviation 
is smaller than 2. Assuming that the variances for the un- 
derlying three populations are equal, we might attempt 
to compare the three populations means two at time and 
for the all possible pairs of sample means using a two-sam- 
ple t test. We will have to perform three sets of t tests. 
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Fig. 45.1 Plot of 95% confidence interval (Cl) for mean age by 
surgical center. 


45 Analysis of Variance 


Table 45.1 Age by Surgical Center 











Center WN Mean Standard Deviation Variance 
1 21 9.2381 2.91384 8.490 

2 21 11.2381 3.28489 10.790 
3 21 9.3810 3.27836 10.718 
Total 63 9.9524 3.2449 10.530 


Problems with Multiple t Tests 


There are some disadvantages with this approach. Per- 
forming three t tests does not seem to be a dilemma, but 
this process would become more complicated as the num- 
ber of the groups increases. Another more important pro- 
blem is the matter of multiple comparisons. Suppose that 
the means of the three groups are in fact equal and we con- 
duct all three t tests. These tests are assumed independent 
and we set the level of significant for each one at 0.05. By 
the multiplicative rule, the probability of failing to reject 
a null hypothesis of no difference for all three t tests would 
be?: 


P (fail to reject null hypothesis in all tests) = 
(1 - 0.05)? = (0.95)? = 0.857 


Consequently, the combined probability of rejecting the 
null hypothesis or type I error would be 0.143 (1 - 0.857), 
which is much larger than 0.05. The fact that we cannot 
assume these tests are independent makes the matter 
even more complex because the same dataset is used 
for each of the t tests.? Therefore, it is preferable to 
use an appropriate procedure in which the overall prob- 
ability of type I error is equal to an a priori level of a, 
usually 0.05. 


Variance and the Sources of Variation 


The variance quantifies the amount of variability or disper- 
sion around the mean. Or simply, we want to know how far 
the observations are from their mean.!' In conducting an 
analysis of variances, the estimates of variance plays a ma- 
jor role. In a one-way ANOVA analysis, we are interested in 
distinguishing the mean differences with respect to only 
one factor or characteristic.” In our example, this charac 
teristic is a surgical center. From Table 45.1, we have a 
mean age for each surgical center and for the combined 
data. As a result, two different sources of variation are pos- 
sible. First, there is the variation of individual values of age 
around their mean in each surgical center. This is referred 
to as within-group variation or noise. This is also called re- 
sidual or error when referring to an ANOVA model. Second, 
there is the variation of the means of surgical centers 
around the overall mean. This is referred to as between- 


295 


296 


www.urdukutabkhanapk.blogspot.com 


IC Practical Guide to Statistical Analysis 


group variation or signal. The addition of between-group 
and within-group variances will give us the total variance 
of the test. The ratio of between-group variations to 
within-group variation is called analysis of variance and 
is shown by test-statistic F. 


Key Concepts: Interpretation of One-Way Analysis of 

Variance” 

e If there is a difference in population means, between- 
group variation exceeds within-group variation and F 
is greater than 1. 

If there is no difference in population means, the dif- 
ferences between the two sources of variations would 
be very small and F is closer to 1. 


Degrees of Freedom 


Under the null hypothesis, the ratio F has an F distribution 
with k - 1 and n - k degrees of freedom where k is number 
of populations and n is the total sample size. The degrees of 
freedom of k - 1 corresponds to the between-group varia- 
tion and n - k relates to the within-group variation.'? For 
our example, the output from SPSS version 15.0 (SPSS 
Inc. Chicago, IL) is shown in Table 45.2. The ANOVA table 
lists the test statistic F and its corresponding P value. Be- 
cause the P value of 0.082 is greater than 0.05, we could 
conclude that the mean age is similar between the three 
surgical centers and we could proceed with our analysis. 
Note that the column labeled “Mean Square” presents 
the between-groups and within-groups estimates of var- 
iance and the F statistic is the ratio of these two values. 
The mean squares are estimated by dividing the column 
of “Sum of Squares” by the corresponding degrees of free- 
dom. The degrees of freedom for between-groups varia- 
tion is 3 - 1 = 2 and for the within groups variation is 
63 - 3 = 60. The degrees of freedom for the combined 
data are 63 - 1 = 62. 


Multiple Comparison Procedures in ANOVA Model 


What happens when we reject the null hypothesis? In this 
case, we could only conclude that the population means 
are not equal; however, we would like to know whether 
all means are different from one another or only one is dif- 
ferent from the others. Once we concluded that there is a 


Table 45.2 One-Way Analysis of Variance by Surgical 
Center 








Sum df Mean  F Statistic Signi- 
of Squares Square ficance 
Between Groups 52.286 2 26.143 2.612 0.082 
Within Groups 600.571 60 10.010 
Total 652.857 62 


statistically significant difference between the means; 
then, we need to conduct an additional test to identify 
where the difference lies. Many different methods are in- 
troduced for a posteriori or post hoc multiple comparisons 
and they typically test each pair separately while preser- 
ving the probability of type I error at an a priori level, 
usually 0.05. The significance level for each of the compar- 
isons would depend on the number of the tests being con- 
ducted. For example, we would require three t tests when 
we have three groups to compare and we must use & = 
0.05/3 = 0.033 as the level of significance for each test. 
This will increase to 6 when we intend to compare four 
groups. The most commonly used methods are Bonferroni, 
Tukey, and Dunnett’s tests.! 

Let us go back to our example. Now that we could com- 
bine the data from three surgical centers, we wish to com- 
pare the mean operation time between the three different 
surgical techniques. We would like to test the null hypoth- 
esis that the mean operation time of the three techniques 
is equal. The alternative hypothesis is that at least one of 
the means differ from one of the others. Figure 45.2 sug- 
gests the mean operation time of the LAP technique is 
most likely different from the other two techniques be- 
cause the 95% CIs of the LAP technique do not cross the 
other two techniques, but there are a lots of overlap be- 
tween RUQ and UMB techniques. However, we would 
like to perform a formal test and make our conclusion 
more specific. 

We begin with the preliminary examination and the 
summary data are shown in Table 45.3. We assume the 
equal variances of mean operation time between the three 
surgical techniques because the ratio of the largest to 
smallest standard deviation (SD) is <2. We could proceed 
with our analysis. The results are shown in Table 45.4. Be- 
cause the P value of 0.001 is smaller than 0.05, we could 
conclude that the mean of operation time is different 
between the surgical techniques. However, we cannot 
be more specific than that. We need to conduct an a poster- 
iori multiple comparisons test to differentiate where the 
difference lies. 








45-4 
404 
v 
5 ? 
5 
2 354 
° 
G J 
x 30 j 
wn 
a 
254 
20 4 
T T T 
LAP RUQ UMB 
Type of Surgery 











Fig. 45.2 Plot of 95% confidence interval (Cl) for the mean op- 
eration time by surgical technique. LAP, laparoscopic; RUQ, right 
upper quadrant; UMB, circumumbilical; OR, operating room. 
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Table 45.3 Operation Time by Surgical Technique 











Surgery N Mean Standard Deviation Variance 
LAP 21 27.2381 6.77425 45.890 
RUQ 21 35.8333 11.15273 124.383 
UMB 21 38.9048 9.85852 97.190 
Total 63 33.9921 10.53814 111.052 


Abbreviations: LAP, laparoscopic; RUQ, right upper quadrant; UMB, 
circumumbilical. 


Table 45.4 One-Way Analysis of Variance of Operation 
Time by Surgical Technique 


Sumof df Mean _ F Statistic Signi- 
Squares Square ficance 
Between Groups 1535.960 2 767.980 8.614 0.001 


Within Groups 5349.286 60 89.155 





Total 6885.246 62 


The SPPS output of multiple comparison tests for the 


Bonferroni procedure is shown in Table 45.5. For each of 


the three possible pairwise t tests, the means differences, 
and corresponding P values and 95% Cls of the mean differ- 


ences are listed. In this case, the mean operation time of 


the LAP technique is significantly less than the other two 
techniques because the mean differences are negative 
and P values of 0.014 and 0.001 are <0.05. The mean op- 
eration time is similar between RUQ and UMB techniques 
with a P value of 0.888. 

The following is an example from the literature. The in- 
vestigators have used ANOVA analysis to compare the 
mean between different groups and a Bonferroni test for 
multiple comparisons. From the way the results are pre- 
sented, we could conclude that a one-way ANOVA was 
used for analysis. 


45 Analysis of Variance 


Examples from the Literature: An Example of a One- 
Way ANOVA Analysis with an A Posteriori Multiple 
Comparison Test 

Source: Gudaityte J, Marchertiene I, Pavalkis D, Salad- 
zinskas Z, Tamelis A, Tokeris I. Minimal effective dose 
of spinal hyperbaric bupivacaine for adult anorectal 
surgery: a double-blind, randomized study. Medicina 
(Kaunas.) 41(8):675-684.? 

Abstract: The aim of the study was to find minimal effec- 
tive dose of spinal hyperbaric bupivacaine for adult an- 
orectal surgery. 

Methods: The study included 93 adult consecutive pa- 
tients admitted for anorectal operations. Dural puncture 
was made before surgery in the sitting position at L3-L4 
or L4-L5 with 25-26G Tamanho spinal needle (Braun, 
Germany) and different volumes of hyperbaric bupiva- 
caine (Marcaine Spinal Heavy 0.5%, AstraZeneca) were 
injected over 2 minutes: group 1 (n = 17) 1.5 ml, group 
2 (n = 38) 1.0 ml, group 3 (n = 38) 0.8 ml. After sitting 
for 10 minutes patients were asked to lie down and 
surgery was started. Following variables were assessed: 
rate of success, level and duration of sensory and motor 
block, time to voiding and ambulation, complications, 
consumption of analgesics, quality of anesthesia accord- 
ing to the patient and medical staff. 

Results: Groups were comparable in demographics. No 
case of failure was registered but four patients (10.5%) 
in the group 3 received supplemental i/v fentanyl to 
treat tension in the abdomen intraoperatively. Level of 
sensory block in groups 1, 2, 3 was 10.4 + 1.7, 7.013 + 
2.2, 6.7 + 1.9 dermatomes, respectively (P < 0.0001 
ANOVA; P < 0.0001 group 1 vs. 2, group 1 vs. 3, P = 1.0 
group 2 vs 3, Bonferroni). Extent of motor block was 
2-3 scores according to the Bromage scale in 70.5% of 
group 1 cases, compared with 0-1 score in 97.3% of 
group 2 and 92.1% of group 3 cases. Median (range) 
duration of motor block in groups 1, 2, 3 was 90 
(0-120), 0 (0-90), and O (0-60) min, respectively 
(P <0.0001 ANOVA; P < 0.0001 group 1 vs. 2, group 1 
vs. 3, P= 0.13 group 2 vs. 3, Bonferroni). Time of ambula- 


Table 45.5 Multiple Comparisons of Operation Time by Surgical Technique 




















(I) Surgery (J) Surgery Mean Difference (l-J) Standard Error Significance 95% Confidence Interval 
Lower Bound Upper Bound 
LAP RUQ -8.59524* 2.91392 0.014 -15.7721 -1.4184 
UMB -11.66667* 2.91392 0.001 -18.8435 -4.4898 
RUQ LAP 8.59524* 2.91392 0.014 1.4184 15.7721 
UMB -3.07143 2.91392 0.888 -10.2483 4.1054 
UMB LAP 11.66667“ 2.91392 0.001 4.4898 18.8435 
RUQ 3.07143 2.91392 0.888 -4.1054 10.2483 


* The mean difference is significant at the 0.05 level. 


Abbreviations: LAP, laparoscopic; RUQ, right upper quadrant; UMB, circumumbilical. 
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tion was 181.5 + 41.5, 136.6 + 32.2 and 123.0 + 45.9 min- 
ute, respectively (P < 0.0001 ANOVA; P < 0.001 group 1 
vs. 2, P< 0.00001 group 1 vs 3, P=0.43 group 2 vs. 3, Bon- 
ferroni). There was no significant intergroup difference 
in time to urinate; retention developed in 20.4% of total 
cases. No difference was found in morphine consump- 
tion, 64.5% of cases did not require rescue analgesics. 
Quality of anesthesia was stated as excellent by the an- 
esthesiologist and surgeon in all groups. However, qual- 
ity was rated as excellent by patient in the operating 


Two-Way Analysis of Variance 


As we discussed earlier, one-way ANOVA analysis classifies 
the means by only one categorical variable, or one factor. 
What happens when we are interested in the study of 
the effect of two independent variables or factors on an 
outcome or a response variable? Apparently, each of these 
factors will have their own categories. We could of course 
apply one-way ANOVA to examine the effect of each factor 
separately, but a two-way or factorial design of ANOVA 
model will present great advantages over several one- 
way or single-factor ANOVA models.! The most important 
of these advantages is that it can provide some unique and 
relevant information about how these factors affect the 
outcome variable independently (main effects) and how 
they affect in combination (interaction). The assumptions 
that patient populations are independent and the distribu- 
tion of the observations in each population is normally dis- 
tributed also apply for the two-way ANOVA model. We 
once more assume that the variances are equal, estimate 
a pooled variance, and use F statistics for a significance 
test. The null hypothesis again is that the population 
means are equal. The alternative hypothesis is that at least 
one of the population means is different from others. Note 
that there are more than one null hypothesis in two-way 
ANOVA, with an F test for each.' We can test for significance 
of the main effect of each factor and the interaction be- 
tween those factors. 


Key Concepts: Advantages of Two-Way ANOVA over 

Multiple One-Way ANOVA' 

e “It is more efficient to study two factors or variables 
simultaneously rather than separately.” 

e “We can investigate interactions between those fac 
tors.” 

e We can reduce the residual variation and increase the 
power of our test by including a second factor thought 
to influence the outcome. 

e We can investigate more than two factors using two- 
way ANOVA. 


Proceeding with our example, let us assume that we wish 
to examine the mean operation time by surgeon as well as 
the surgical technique. Also, we would like to examine the 


room in groups 1, 2, 3: 58.8, 94.7, and 86.8%, respectively 
(P = 0.003), on day 1 postoperatively: 76.5, 92.1, and 
97.4%, respectively (P = 0.023); by nursing staff: 82.4, 
100, and 97.4%, respectively (P = 0.019). Lower rates in 
group 1 were due to extensive motor block. In conclu- 
sion, a minimal recommended dose of spinal hyperbaric 
bupivacaine for anorectal surgery is 4-5 mg; a dose of 
7.5 mg is excessive due to prolonged sensory and motor 
block. 


Table 45.6 Surgical Technique and Surgeon Factors and 
Their Levels with Mean Operation Time 


Surgical Technique 














Surgeon Mean OR time 
LAP RUQ UMB 

1 7 7 7 31 

2 y 7 7 36 

3 7 7 7 36 

Mean OR time 27 36 39 34 


Abbreviations: LAP, laparoscopic; RUQ, right upper quadrant; 
UMB, circumumbilical; OR, operating room. 


combined effect of these factors. Table 45.6 summarizes 
the factors and their levels with mean operation time for 
this example. There are three levels for each factor. As we 
can see, the mean operation time for surgeon 1 and LAP 
technique seems to be less than the other levels in each fac 
tor. We examined the effect of surgical technique on the 
mean operation time in the previous section and con- 
cluded that the LAP technique had a significantly less 
mean operation time than the other two techniques. Of 
course, we could perform another one-way ANOVA test 
to examine the effect of surgeon on the mean operation 
time, but we wish to examine the effect of each of these 
factors independently as well as simultaneously. 

First, we would like to plot the mean operation time for 
surgeon and surgical technique. As we can see, Fig. 45.3 
provides us with more information. Figure 45.3 suggests 
that the mean operation time is greater for UMB and 
RUQ techniques than it is than LAP techniques for all three 
surgeons. Also, the mean is smaller for surgeon 1 than sur- 
geon 2 and 3 and this it true for all three surgical techni- 
ques. From this figure, we could see that both factors, the 
surgical technique and surgeon, have a main effect on the 
mean operation time. Also, it seems to be an interaction 
between the factors of surgeon and surgical technique be- 
cause the lines are not parallel. However, we would like to 
conduct a formal test and examine these effects and the 
significance of their interaction on the mean operation 
time more closely. 
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Fig. 45.3 Plot of estimated marginal means of operation time. 
LAP, laparoscopic; RUQ, right upper quadrant; UMB, circumum- 
bilical; OR, operating room. 


The output from Minitab 14.0 (Minitab Inc., State Col- 
lege, PA) presents us with degrees of freedom, between- 
groups and within-groups variances as well as F statistics 
and P values (Table 45.7). Note that there are three null hy- 
pothesis in this two-way ANOVA, with an F test for each.' 
We tested for significance of the main effect of surgical 
technique and surgeon factors separately and simulta- 
neously and consequently, we have an F statistic for each. 
As mentioned before, the mean squares (MS) are calcu- 
lated by dividing sum of squares (SS) by the corresponding 
degrees of freedom. The F statistics are estimated by divid- 
ing MS ofeach factor by the error factor (within-group var- 
iances). The degrees of freedom are calculated as 3 - 1 = 2 
for surgical technique, and surgeon variables, (3 - 1)(3 - 1) 
= 4 for the interaction factor and 63 - (3 x 3) = 54 for the 
error factor. 

The results suggest that there is a main effect of surgical 
technique on the mean operation time with a P value of 
0.001, but the effect of surgeon on the mean operation 
time is not statistically significant because the P value of 
0.147 is >0.05. There is not a significant interaction effect 
between the factors of surgical technique and surgeon on 
the mean operation time because the P value of 0.853 is 
far larger than 0.05. Note that these nonsignificant results 
are more likely due to the small sample size and the fact 
that the study is underpowered. As R? or the coefficient of 
determination, which is calculated by dividing the sum of 
between-group variations to total variation, indicates 
that between-group variation accounts for only 29.27% of 


45 Analysis of Variance 


total variation in data. The coefficient determination be- 
comes even smaller when the number of the observations 
is taken into consideration (adjusted R? = 18.79%). The fol- 
lowing is an example from the literature. The investigators 
have used a two-way ANOVA test to examine the effect of 
growth hormone and parenteral nutrition on the preser- 
vation of muscle mass in surgical patients. 


Examples from the Literature: An Example of a Two- 
Way ANOVA Analysis 

Source: Sevette A, Smith RC, Aslani A, et al. Does growth 
hormone allow more efficient nitrogen sparing in post- 
operative patients requiring parenteral nutrition? A 
double-blind, placebo-controlled randomized trial. Clin 
Nutr 2005;24(6):943-955.* 

Abstract: Growth hormone (GH) has a strong anabolic 
effect and is thought to be useful in improving the effi- 
cacy of parenteral nutrition (PN) to preserve muscle 
mass (MM) in the postoperative setting. Unfortunately, 
the negative clinical outcome of GH treatment in inten- 
sive care patients limits its use in this setting, but de- 
mands answers to the mechanism behind the action of 
this therapy. 

Method: In a double-blind randomized controlled study 
consecutive patients after major abdominal surgery 
were divided into four groups of either 1/2-PN (0.13 g 
N/kg/day and 52% of calories as lipid) or full-strength 
PN (Full-PN) (0.3 g N/kg/day and 65% of calories as lipid) 
receiving daily injections of either GH (8-16 IU) or pla- 
cebo for a period of 14 days postoperative. Outcome 
measures included MM derived from measures of total 
body potassium (40K counting) and total body nitrogen 
(TBN) (in vivo neutron capture technique); fat mass from 
skin folds; serum insulin like growth factor-I (IGF-I) and 
its binding proteins (IGFBP). 

Results: From 43 major upper GI surgical patients ran- 
domized 35 completed the study (one patient died 
from sepsis in the half-strength PN (1/2-PN)+GH 
group). 1/2-PN (n = 11) lost TBN (P = 0.001), MM (P = 
0.005) but not fat. Full-PN (n = 9) maintained TBN, MM 
(P = 0.056) and fat. 1/2-PN+GH (n = 8) maintained 
TBN and fat but lost MM (P = 0.038). Full-PN+GH (n = 7) 
maintained TBN and MM but lost fat (P = 0.018). 
Two-way ANOVA indicated that PN input (P = 0.031) 
and not GH had a significant effect on MM. GH caused 


Table 45.7 Two-way ANOVA of Operation Time by Surgical Technique and Surgeon 














Source df Sum of Squares Mean Square F Statistic P Value 
Surgical technique 2 1535.96 767.980 8.52 0.001 
Surgeon 2 358.53 179.266 1.99 0.147 
interaction 4 120.83 30.206 0.33 0.853 
Total 62 6885.25 

S = 9.497 R? = 29.27% Adjusted R? = 18.79% 


Abbreviations: LAP, laparoscopic; RUQ, right upper quadrant; UMB, circumumbilical. 
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a significant rise in IGF-I levels (290 + 67 and 454+71 pg/ 
1 for 1/2-PN+GH and Full-PN+GH, respectively) and 
restored serum IGFBP3 and the acid labile subunit to 
normal, by the postoperative day 9. 

Conclusion: After major gastrointestinal surgery, GH 
causes a marked hepatic IGF-I response and nitrogen re- 
tention but its effect on body composition was more sig- 
nificant with a high PN input. Further, Full-PN alone was 
sufficient to prevent nitrogen loss and preserved MM 
and addition of GH does not provide further metabolic 
advantage. 


Note that we could test for the effect of more than two ca- 
tegorical variables using general linear model, but that is 
outside of scope of this book. 
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Key Concepts: Key Steps in Conducting an ANOVA 

Analysis! 

1. Make sure that assumptions are met and you have 
chosen the appropriate ANOVA design. 

2. Define your dependent variable (outcome or re- 
sponse variable). 

3. Define your independent variables (categorical fac 
tors). 

4. Compute the sample means, standard deviations, 
and variances for all groups. 

5. Plot means and 95% Cls (or side-by-side box plot) to 
check for the distribution of the observations and de- 
viation from normality. 

6. “Compute ratio of the largest to the smallest sample 
standard deviation.” 

7. If this ratio is smaller than 2 and the distribution is 
satisfactory, conduct the appropriate ANOVA analy- 
sis. 

8. Interpret the results appropriately. 
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Correlation Defined 


“It is the mark of an educated mind to be able to entertain a thought without accepting it.” 


Summary 


The goal of this chapter is to provide a basic understanding 
of statistical correlation and its usage in the Orthopaedic 
literature. Common equations used to derive a correlation 


Introduction 


The correlation between two variables can be quantified 
by the correlation coefficient r. The square of the coeffi- 
cient of correlation r is easier to interpret. r° is the propor- 
tion of the variance in one variable that is explained by the 
variance in the other. A P value is calculated to test the sig- 
nificance of a correlation. If Gaussian distribution is not as- 
sumed (at least approximately), then a nonparametric 
form of correlation must be used. 

Correlation evaluates the similarity of two sets of mea- 
surements (i.e., two dependent variables) obtained on 
the same observations. 


Jargon Simplified: Variables 

A variable is a quantity that can take various values for 
different individuals. These are values that we measure, 
control, or manipulate in research. 


Jargon Simplified: Independent versus Dependent 
Variable 

An independent variable is one that influences the out- 
come being measured, which is the dependent variable. 
The independent variable might be a variable that you 
control, like a treatment, or a variable not under your 
control, like an exposure. It also might be a demographic 
factor like age or gender. Therefore, an independent vari- 
able is a hypothesized cause or influence on a dependent 
variable in a study. 


Correlation is acommon statistical technique used to study 
the relation between two variables within a group of sub- 
jects. The reasons for using this technique include: 

1. To assess if the two variables are associated, that is, if 
the values of one variable tend to be higher (or, alterna- 
tively, lower) for higher values of the other variable. 

2. To assess if the values of one variable can be predicted 
from any known value of the other variable. 

3. To assess the amount of agreement between the values 
of the two variables, something commonly seen in the 


— Aristotle 


coefficient, Spearman rank correlation, the confidence in- 
terval, and a P value are reviewed. 


comparison of alternate ways of measuring or assessing 
the same thing. 


Key Concepts: Correlation 

Correlation is a statistical method of analysis used when 
studying the possible association between two continu- 
ous variables. Two variables are correlated if changes in 
one variable tend to be accompanied by changes in the 
other, in either the same or the opposite direction. 


Jargon Simplified: Continuous Variables versus 
Categorical Variables 

Continuous variables are variables with potentially infi- 
nite numbers of possible values. Examples include 
height, weight, blood pressure, or range of movement. 
Categorical variables are variables whose values belong 
to one of several distinct categories. Examples include 
gender or fracture classification. 


Correlation is a statistical method of analysis used when 
studying the possible association between two continuous 
variables. Two variables are correlated if changes in one 
variable tend to be accompanied by changes in the other, 
in either the same or the opposite direction. For example, 
Rogachefsky et al! in a study of the outcome of operative 
treatment of complex articular fractures of the distal ra- 
dius, have shown immediate postoperative articular in- 
congruity to be positively correlated with functional out- 
come, that is, the more comminuted fracture patterns 
were associated with worse functional outcome. 


Examples from the Literature: Correlation 

Let us consider the example of a study by Hughes et al.” 
The authors sought to demonstrate the importance of 
the toes during walking. They used a dynamic pedobar- 
ograph to examine the weight-bearing function of the 
foot in a large number of people without foot problems. 
The subjects were aged between 5 and 78 years with an 
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equal number of males and females in each 5-year group 
until aged 30 years and in each 10-year age group there- 
after. Figure 46.1 is a scatter diagram showing the rela- 
tionship between the weight (kg) of a subject and the 
peak pressure (kPa) under his or her great toe (which 
of all toes, bears the greatest peak pressure) in the left 
foot (these are hypothetical data from 90 subjects based 
on the results of Hughes et al”). The scatter plot shows a 
positive linear relationship between the two variables, 
indicating that as a subject’s weight increases there is a 
tendency for the peak pressure under the great toe of 
the left foot to increase as well. That is, the two variables 
vary together or show a high degree of correlation. 


Coefficient of Correlation r 


The degree of association between two variables is mea- 


sured by calculating the correlation coefficient, also know 
as the product-moment correlation coefficient or Pearson 
coefficient. The standard method leads to a quantity called 
r and its value can range from -1 to 1 (inclusive). The cor- 
relation coefficient r measures the direction and magnitude 
of linear correlation between the values of the two vari- 
ables, although correlation does not imply causation. If 
the correlation coefficient is zero, then the two variables 
donot vary together at all or in other words there is no linear 
relationship between the values of the two variables (that is 
they are uncorrelated). Thus, a value of +1.0 or-1.0 indicates 
that the points ina scatter diagram lie on a perfectly straight 
line as shown in Fig. 46.2. Examples of scatter plots with 
intermediate values of r are shown in Fig. 46.3. 

In essence what r measures is the degree of scatter of the 
values around an underlying straight line: the greater the 
scatter of the values, the lower the correlation. 


Key Concepts: Directions of Correlations 

Two variables are correlated if changes in one variable 
tend to be accompanied by changes in the other, in 
either the same or the opposite direction. If the correla- 
tion coefficient is positive, the two variables tend to in- 
crease together. If the correlation coefficient is negative, 
the two variables are inversely related, that is, the values 
of one variable are lowered as the values of the other 
variable get higher. 


Jargon Simplified: Correlation Coefficient 

The correlation coefficient, denoted by r, is a measure of 
association that indicates the degree to which two vari- 
ables have a linear relationship. 


Key Concepts: Pearson Coefficient Correlation 

The Pearson coefficient correlation is a widely used tech- 
nique for assessing correlation and is the basis for tech- 
niques determining mathematical functions (such as re- 
gression analysis) or patterns of interdependence (such 
as factor analysis). 
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Fig. 46.1 Scatter diagram showing the linear relationship 
between peak pressures of the great toe in the left foot and 
weight in 90 subjects. (From Petrie A. Statistics in orthopaedic 
papers. | Bone Joint Surg Br 2006;88-B, No. 9:1121-1136. 
Reprinted by permission.) 




















Fig. 46.2 Data with correlation coefficient (r) of (a) 1.0 and (b) 
-1.0. (From Altman DG. Practical Statistics for Medical Research. 
Boca Raton, FL: Taylor & Francis CRC Press;1991:280. Reprinted by 
permission.) 
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Fig. 46.3 Data with correlation coefficients (r) of (a) 0.0, (b) 0.3, 
(c) -0.5, and (d) 0.7. (From Altman DG. Practical Statistics for 
Medical Research. Boca Raton, FL: Taylor & Francis CRC 


Press;1991:280. Reprinted by permission.) 
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Properties of the Coefficient of Correlation 


The coefficient of correlation is a number without units. 
This occurs because of the division of the units of the nu- 
merator by the same units in the denominator, eliminating 
these units. Hence, the coefficient of correlation can be 
used to compare different studies performed using differ- 
ent variables. 

The magnitude of the coefficient of correlation is always 
smaller than or equal to 1. This happens because the nu- 
merator of the coefficient of correlation is always smaller 
than or equal to its denominator, following the Cauchy- 
Schwartz inequality principle. 


Interpreting the Coefficient of Correlation 


The coefficient of correlation measures only the linear re- 
lationship between two variables and will miss a nonlinear 
relationship. For example, Fig. 46.4 displays a perfect non- 
linear relationship between two variables (that is, the data 
shows a U-shape relationship with Y being proportional to 
the square of W), but the coefficient of correlation is equal 
to 0. The significance of the correlation coefficient is highly 
dependent on the number of observations in the sample: 
the greater the sample size, the smaller the P value asso- 
ciated with a correlation coefficient of a given magnitude. 
Therefore, to assess the linear association between two 
variables fully, we should calculate 1, as well as r and its 
P value. 

















Fig. 46.4 A perfect nonlinear relationship with a zero correlation. 
(From Salkind NJ, ed. Encyclopedia of Measurement and Statistics. 
Newbury Park, CA: Sage Publications;2007:160. Reprinted by 
permission.) 
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Fig. 46.5 The correlation of the set points represented by the 
circles is equal to - 0.87, when the point represented by the dia- 
mond is added to the set; the correlation is now equal to +0.61. 
This shows that an outlier can completely determine the value of 
the coefficient of correlation. (From Salkind NJ, ed. Encyclopedia 
of Measurement and Statistics. Newbury Park, CA: Sage Publica- 
tions;2007:160. Reprinted by permission.) 


The Effect of Outliers 


Outliers can influence all statistical calculations, especially 
correlation. Observations far from the center of the distri- 
bution contribute a lot to the sum of the cross products. At 
the extreme, as illustrated in Fig. 46.5, one extremely devi- 
ant observation (often called an outlier) can dramatically 
influence the value of r. 


Interpreting r? 


The coefficient of correlation is a descriptive statistic, 
which always overestimates the population correlation. 
To obtain a better estimate of the population, the value r 
needs to be corrected. The corrected value of r goes by dif- 
ferent names: corrected r, shrunken r, or adjusted r, and is 
denoted by r°. There are several correction formulas avail- 
able to estimate the value of the population correlation. 

The squared coefficient of correlation gives the propor- 
tion of common variance between two variables. It is also 
called the coefficient of determination. r° is an easier value 
to interpret than r. Because r is always between -1 and 1, r° 
is always between 0 and 1 and is smaller than r. 

If the general assumptions listed in the following section 
are true, r can be interpreted as the fraction of the var- 
iance that is shared between the two variables. 


Assumptions Associated with the Coefficient of 
Correlation 


The coefficient of correlation can be calculated from any set 
of data and is a very useful descriptor of data. However, to 
make inferences from the coefficient of correlation, the fol- 
lowing assumptions should be true: 

1. Subjects should be randomly selected from or be at 
least representative of a larger population. 

2. Paired samples: Each subject should have both X and Y 
values. 

3. Independent observations: Sampling one subject of the 
population should not influence the chances of sam- 
pling other subjects. The relationship between the 
sample subjects must be the same. 

4. X and Y values must be measured independently. The 
values of X and Y should not be interrelated as the cor- 
relation calculations then become meaningless. 

5. X values should be measured and not controlled. If the 
X variable is manipulated (i.e., concentration, dose, or 
time), then linear regression should be calculated 
rather than correlation. The confidence interval of r 
cannot be interpreted if the experimenter controls or 
manipulates the value of X. 

6. All covariation must be linear. The correlation coeffi- 
cient would not be meaningful, for example, if Y in- 
creases as X increases up to a certain point but the Y de- 
creases as X increases further. 

7. Normal (Gaussian) distribution: Many statistical tests 
are based on the assumption that the data are sampled 
from Gaussian populations. This implies that the X and 
Y values must each be sampled from populations that 
follow a normal distribution, at least approximately. 


Jargon Simplified: Confidence Interval 

The confidence interval is a computed range of values 
with a given probability, often 95%, in which the true va- 
lue of a variable is contained. For example, if a sample 
mean is 23 and the lower and upper limits of the 95% 
CI (P = 0.05) are 19 and 27, respectively, then you can 
conclude that there is a 95% probability that the popula- 
tion mean is greater than 19 and lower than 27. The 
width of a conference interval depends on the sample 
size and on the variation of data values. 


The 95% Confidence Interval of a Correlation 
Coefficient 


Clinical studies are typically done on sets of patients or on 
a sample population. Because it is not possible to study an 
entire population, data gathered from these sample popu- 
lations are generalized to the entire population. This pre- 
sents the uncertainty of how well a sample represents 
the general population. This uncertainty or error could 
be either systematic or random. The margins of error in 
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these variables may be represented by conference intervals 
(CIs). 

In clinical studies, often when comparing groups, CIs are 
used to express the difference between two variables. The 
CI is calculated using a standard formula. In the study by 
Hughes et al,” data from Fig. 46.1 gives us a correlation 
coefficient of 0.38. Using the standard equation, the 95% 
CI for the above correlation coefficient is from 0.18 to 0.54. 


Hypothesis Test for r 


Hypothesis testing is acommon approach in the orthopae- 
dic literature. Cohort studies or trials are usually designed 
to test a hypothesis. The hypothesis derives from an inves- 
tigator’s research question. The relationship between r and 
P value depends on N, the number of XY pairs. The test of 
significance of the null hypotheses is based on the t distri- 
bution and is calculated using a standard equation, which 
converts r into t. 


Jargon Simplified: P Value 

The P value, the statistical significance of a result, is the 
probability that the observed relationship between vari- 
ables or difference between means in a sample occurred 
by chance, and that in the population from which the 
sample was drawn, no such relationship or differences 
exist. 


Spearman Rank Correlation 


If one or both of the variables are measured on an ordinal 
scale, or if we are concerned about the normality of distri- 
bution of the numerical variable(s), we can calculate the 
nonparametric Spearman correlation coefficient, which 
also takes values from -1 to +1. Its interpretation is similar 
to that of the Pearson correlation coefficient, although it 
provides an assessment of association rather than linear 
association, and its square is not a useful measure of good- 
ness-of-fit. 


Statistical Calculations 


The Coefficient of Correlation 


The equation for calculating r can be expressed in several 
ways. If we have two variables X and Y, the correlation be- 
tween them, denoted by r is given by 
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Jargon Simplified: Null Hypothesis 

A null hypotheses is the statistical hypothesis that one 
variable has no association with another variable or set 
of variables, or that two or more population distribu- 
tions do not differ from one another. 


Examples from the Literature: Pearson Correlation 
and P Value 

Rogachefsky et al! in a study of the outcome of operative 
treatment of complex articular fractures of the distal ra- 
dius, explored the association between immediate post- 
operative articular incongruity and outcome using a 
clinical rating scale. Total incongruity (step plus gap, 
measured in millimeters) and the outcome scale were 
continuous variables and assumed to be normally dis- 
tributed. The authors calculated in these circumstances 
the Pearson coefficient correlation to be 0.70, with a P 
value of less than 0.002. 


Jargon Simplified: Normal Distribution 

Normal distribution can be shown in a graph of fre- 
quency distribution of data denoted by a bell-shaped 
curve. Normal distribution is a continuous symmetric 
distribution; the mean, median, and mode are identical; 
and its shape is entirely determined by the mean and 
standard deviation. All parametric tests require the 
data to have a normal distribution whereas nonpara- 
metric tests are so-called distribution free. 


The test is based on a simple idea. The values are listed in 
order from low to high, and a rank is assigned to each value. 
All further analyses are based on the ranks. By analyzing 
ranks rather than values, you do not need to care about dis- 
tribution of the population. Spearman rank correlation is 
based on the same assumptions as the Pearson correlation 
described earlier, with the exception that rank correlation 
does not assume Gaussian distributions. 


Here Xis the mean X value, S, is the standard deviation ofall 
X values, and N is the number of data points. The mean X 
and mean Y values define a central point of the data. The 
position of each point is compared with that center. The 
horizontal distance (X;-X) is positive for points to the right 
of the center and negative for points to the left. The vertical 
distance (Y,-Y) is positive for points above the center and 
negative for points below. 
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The distances are standardized by dividing the standard 
deviation (SD) of X or Y. The quotient is the number of SDs 
that each point is away from the mean. The two standar- 
dized distances are multiplied in the numerator of the 


The Confidence Interval 


Once r is calculated from data in one sample, then the 
equation to calculate the Cl is: 


where r is transformed to get the quantity z. The standard 
error of z is approximately 





The Spearman Correlation Coefficient 


First, the values of X and Yare separately ranked. The smal- 
lest value gets a rank of 1. Then calculate the correlation 
coefficient between the X ranks and the Y ranks using 


The P Value from the Correlation Coefficient 


The relationship between r and the P value depends on n, 
the number of XY pairs. The equation used to determine P 
is 

N-=2 


t=r4/— 
1 -r? 


equation. The magnitude of the numerator depends on 
the number of data points. Finally, account for sample 
size by dividing by N - 1. N is the numerator of XY pairs. 


where n is the sample size, so the equation to construct a 
95% CI for z is 
meee 1.96 ie 1.96 
1 aoa 2 
The above values are back-transformed to get a CI for the 
population correlation coefficient r as: 





e222 - J 
e222 + 1 


gea] 
e741 + 1 





the same equation for r. The resulting coefficient is called 
r, A special method is used to calculate the 95% CI of r,. 


which converts r to t. It is common to use tables to find the 
P value, such as Table 46.1. The number of degrees of free- 
dom equals n - 2, and this determines which column in the 
table to use. Find the row corresponding to the closest va- 
lue of t and read the P value. 
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Table 46.1 Determining a P Value from t 


Degrees of Freedom 





t 6 7 8 9 10 11 12 13 14 15 16 

1.0 356 351 347 343 341 339 337 336 334 333 332 
1.1 313 308 303 .300 .297 .295 293 291 .290 .289 .288 
1.2 .275 .269 .264 .261 258 .255 253 292 .250 .249 248 
1.3 241 1239 .230 .226 223 .220 218 .216 215 .213 212 
1.4 2d .204 .199 .195 .192 .189 .187 -185 .183 182 181 
1.5 .184 177 .172 -168 165 -162 3159 .158 .156 .154 153 
1.6 161 .154 148 .144 141 .138 .136 .134 132 130 .129 
1:7 .140 133 128 123 .120 .117 AWS .113 SRI .110 .108 
1.8 122 115 .110 .105 .102 .099 .097 .095 .093 .092 091 
1:9 .106 .099 .094 .090 .087 .084 .082 .080 .078 .077 .076 
2.0 .092 .086 .081 .077 .073 .071 .069 .067 .065 .064 .063 
2.1 .080 .074 .069 .065 .062 .060 .058 .056 .054 .053 .052 
22 .070 .064 .059 .055 .052 .050 .048 .046 .045 .044 .043 
23 .065 .055 .050 .047 .044 .042 .040 .039 .037 .036 .035 
2.4 .053 .047 043 .040 .037 .035 .034 .032 .031 .030 .029 
2.5 .047 041 .037 .034 .031 .030 028 .027 025 .025 024 
2.6 041 .035 .032 .029 .026 .025 023 .022 021 .020 .019 
2.7 .036 031 .027 .024 .022 021 .019 .018 .017 .016 .016 
2.8 .031 .027 023 021 .019 .017 .016 .015 .014 .013 .013 
2.9 .027 1023 .020 .018 .016 .014 .013 .012 :012 .011 .010 
3.0 .024 .020 .017 .015 013 .012 011 .010 .010 .009 .008 
3.1 021 .017 015 .013 011 .010 .009 .008 .008 .007 .007 
32 .019 .015 013 011 .009 .008 .008 .007 .006 .006 .006 
3.3 .016 .013 011 .009 .008 .007 .006 .006 .005 .005 .005 
3.4 .014 011 .009 .008 .007 .006 .005 .005 .004 .004 .004 
3.5 .013 .010 .008 .007 .006 .005 .004 .004 .004 .003 .003 
3.6 011 .009 .007 .006 .005 .004 .004 .003 .003 .003 .002 
3:7 .010 .008 .006 .005 .004 .004 .003 .003 .002 .002 .002 
3.8 .009 .007 .005 .004 .003 .003 .003 .002 .002 .002 .002 
3.9 .008 .006 .005 .004 .003 .002 .002 .002 .002 .001 .001 
4.0 .007 .005 .004 .003 .003 .002 .002 .002 .001 .001 .001 
4.1 .006 .005 .003 .003 .002 .002 .001 001 .001 .001 .001 
4.2 .006 .004 .003 .002 .002 .001 .001 001 .001 .001 .001 
4.3 .005 .004 .003 .002 .002 .001 .001 .001 .001 .001 .001 
4.4 .005 .003 .002 .002 .001 .001 .001 .001 .001 .001 <.001 
4.5 .004 .003 .002 001 .001 001 .001 .001 <.001 <.001 <.001 
4.6 .004 .002 .002 001 .001 .001 .001 <.001 <.001 <.001 <.001 
4.7 .003 .002 .002 .001 .001 001 .001 <.001 <.001 <.001 <.001 
4.8 .003 .002 .001 001 .001 .001 <.001 <.001 <.001 <.001 <.001 
4.9 .003 .002 .001 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 
5.0 .002 .002 .001 001 .001 <.001 <.001 <.001 <.001 <.001 <.001 
5:1 .002 .001 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5:2 .002 .001 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5:3 .002 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5.4 .002 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5.5 .002 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5.6 .001 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5:7 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5.8 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
5.9 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.0 .001 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.1 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.2 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.3 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.4 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.5 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.6 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 
6.7 .001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 <.001 


Source: From Motulsky H. Intuitive Biostatistics. Oxford/New York: Oxford University Press;1995:Table A5.4, 370-371. Reprinted by 
permission. 
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Conclusions 


The coefficient of correlation is a useful descriptor of linear 
relationship in a dataset. It describes the magnitude and 
strength of relationship between random variables. The 
significance of the correlation coefficient is highly depen- 
dent on the number of observations in the sample: the 
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greater the sample size, the smaller the P value associated 
with a correlation coefficient of a given magnitude. There- 
fore, to assess the linear association between two variables 
fully, we should calculate r?, as well as r and its P value. 
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participation offer review 194-195 
phases 196, 203-204 
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coefficient of determination (r°) 
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design 172 
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contingency tables 286 
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“cookbook” medicine 14 
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and r° 304 
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interpreting 303 
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reproducibility 89 
uncertainty 86-87 
dichotomous outcome 178, 179 
dictation script 260 
differences, in outcome values 281-282 
direct costs 195 
discounting of costs, in economic analy- 
sis 83 
discrete data 270 
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appraisal 83 
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effect size 139 
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effectiveness 82, 204 
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review 155 
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defined 204 
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EuroQol-5D 114 
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evidence 
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for research studies 193 
experimental event rate (EER) 122, 123 
experimental studies 32-34, 45 

vs. observational 44-45 
experiments, controlled, history 3-4 
expertise bias 47 
expertise-based design 62, 166 
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costs 183 
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patients 263-265 
report 253 
Food and Drug Administration (FDA) 
drug and device approval 205 
guidelines 5, 6 
foreground questions 69, 158 
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foundation funding 211 
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GCP see good clinical practice 
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generic measures 
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outcome 102-103, 112-115 
good clinical practice (GCP) 197, 205 
Google Scholar 160 
government funding 210-211 
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grant applications, tips 212-213 
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evidence-based 21 
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and question formulation 159 
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Health Utilities Index (HUI) 114-115 

health-related quality of life (HRQL) 80 
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cultural and language barriers 110 
disease-specific outcomes 116-120 
generic measures 112-114 
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joint-specific measures 118 
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148 
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patient-specific measures 118-119 

regional measures 117-118 

selection of measures 147 

studies, reviewing and reporting 110 

utility measurement 112, 114-115 
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defined 69, 71 

statistical test for 74 
hierarchy of evidence 10, 13, 31, 37-49 
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Schunemann and Bone 39 

for treatment studies 39-40 
histograms 271 
history 

clinical research 3-4 
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human welfare, ethical principle 5 
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alternative 275, 276 

a priori 71 
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formulating 145 

test for r 305 

testing 132, 133, 135-138, 275-277 
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incremental cost-effectiveness ratio 
(ICER) 78 
incremental cost-utility ratio (ICUR) 
80-81 
independent variables 277, 288 
vs. dependent 301 
indirect costs 195 
industry funding 211-212 
industry-initiated/sponsored trials 196, 
212 
inference, statistical 276 
informed consent 5, 15, 183, 200-201, 
205 
form 183, 199-200 
obtaining 200-201 
proxy 262 
see also withdrawal from study 
inspection, defined 207 
institutional review board 6 
instruments 
internal consistency 93 
and reliability 92 
intention-to-treat principle 166-167 
interim data 230, 234 
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International Clinical Trials Registry 6 
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Internet 
for adjudication 241 
for communication with clinical sites 
227, 228 
for data management 243-244 
randomization system 189, 224-225 
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94, 238 
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interrater reliability 107, 108 
interventions, in manual of operations 
262-263 
intraclass correlation coefficient (ICC) 95, 
97, 107 
intraobserver reliability 93, 238 
investigator-initiated trial 196 
investigators 
defined 205 
meeting of 229 
principal 185, 215, 217-218 
role 197 
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Jadad decision algorithm 74, 75 
joint-specific measures, HRQL 118 
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kappa coefficient 94-95 
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language barriers, HRQL 110 
learning curve 47 

least-squares regression line 291 
levels of evidence 37-49 

likelihood ratios, diagnostic tests 88 
linear regression analysis 290-291 
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critical appraisal 98-99 
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and research questions 154 
local funding 211 
loss to follow-up 166 
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magnitude of effect see effect size 
mail, for data management 246-247 
costs 190 
maintenance costs, communication 
systems 188-189 
management trials 60 
manual of operations 190, 258-267 
appendix 266 
background information 259 
communication 259 
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patient status evaluation 259-260 
manuscript preparation, systematic 
reviews 74 
mean 272, 273 
and proportions 279-287 
standard error of 274, 275 
statistical 271 
measurement 91 
measurement error 92 
measures see outcome measures/para- 
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median 271, 305 
medical literature see literature 
searching 
Medical Subject Headings (MeSH) 162 
medications log 251 
MEDLINE 160-162 
meetings, costs 191-193 
mentoring, for grant applications 212 
meta-analysis 21, 35, 68-76 
conducting 70-74 
defined 82, 146, 213 
hip fractures, interpreting 130-131 
methodology, selecting 146-149 
Microsoft Excel 
for chi-square test 285-286 
for t testing 283-285 
modeling, statistical 277 
monitoring, clinical trials 206-207 
multicenter randomized trials 
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budget preparation 181-193 
committees 219 
see also randomized controlled trials 
multivariate regression analysis 52, 180, 
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narrative review 68 

National Cancer Institute (NCI), trials 
programs 6 

National Institutes of Health (NIH) 
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and funding 210 
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negative trials/studies 72, 129-130 
newsletter, for clinical centers 227 
n-of-1 trials 9, 34, 65-66 
defined 9, 40 
example 66 
noise, and statistical significance 292, 
295 
nominated principal investigator 185, 
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nonparametric methods 272-273 
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planning 170-176 
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normal distribution 273, 274 
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null hypothesis 132, 156, 275, 276, 280 
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number needed to treat (NNT) 15, 123 
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observational studies 34, 170, 171 
data elements required 54 
disadvantages 45 
vs. experimental 44-45 
vs. RCTs 13, 45-46 
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odds ratio 122-123 
2x2 table 174 
calculation 51, 52 
defined 15, 51 
and phi 95 

office costs, in research studies 
190-191 

one-tailed tests 282 

one-way analysis of variance 294-298 
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orthopaedic surgery, and RCTs 59 
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patient follow-up 263 
perspectives 101 
primary 238 
types 178 
outliers, effect of 304 
OVID 162 
Oxford Centre for Evidence-Based Medi- 
cine 25, 37 
levels of evidence tables 37, 38-39 
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P value 275, 280 
calculating 306-308 
and confidence interval 127 
defined 15, 132-134, 280, 305 
example use of 133 
limitations 133-134 
and magnitude of agreement 98 
meaning 281 
see also significance 
paired t test 282-283 
parallel trials 33, 63 
parametric methods 272 
confidence intervals for 128 
participants in studies 
rights 201 
selection, in reliability study 96 
participation offer, clinical trials 194, 
212 
patients 
application of evidence 21 
blinding 165 
confidentiality 201-202 
consent 183 
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discussing results with 125 
evaluating 259-260 
expenses for participation 184 
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follow-up 263-265 
individual, and EBM 16 
information needed for consent 200- 
201 
perspective on outcome 101, 111 
recruitment 183 
reminder form 263, 264 
rights as participants 201 
safety, reporting 207 
screening 183 
selection, in reliability study 96 
shadow chart 202 
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see also informed consent 
Patient-Specific Index (PASI) 118-119 
patient-specific measures, HRQL 
118-119 
payments to clinical sites 184 
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peer-review system 211 
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pharmaceutical industry 196 
phases, of clinical trials 196, 231 
defined 203-204 
phi 95 
phone costs 189 
PICO questions 158 
pilot study 
budget 193 
defined 211 
pocket protocol 191 
defined 216 
Poisson distribution 273 
pooling of data 
defined 69 
software for 72 
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normal distribution 274 
and samples 269 
positive trials/studies 72, 129-130 
posters, for potential study patients 260, 
261 
postoperative information form 253 
power of studies/tests 136-137, 149, 
276 
defined 15, 68 
pragmatic trials 60 
preappraised evidence 159 
precision, vs. accuracy 127 
predictive value 88-89 
predictor variables 180 
preoperative information form 251-252 
prevalence, defined 175 
prevalence studies 35 
prevalent cases, defined 50 
primary outcome 238 
primary study 
bias types 71 
defined 69, 72 
principal investigator 185, 215 
site 217-218 
printing costs, research studies 190 
probability 
conditional 121-122 
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distributions 272-274 
ratios, diagnostic tests 88 
single-event 121 
types 121-122 
prognostic studies 179-180 
hierarchies of study design 41 
levels of evidence 38 
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proportions 270 
and means 279-287 
proposal, research 
example format 150 
requirements 145-152 
prospective cohort study 55-58 
prospective study, defined 56 
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pocket 191, 216 
for RCT, organizing 150-152 
surgical, standardization 164-165 
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proxy consent 262 
publication bias 14, 72 
defined 14 
publications, preparation 229 
PubMed searches 16, 160-162 
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quality assurance, in adjudication 
240-241 
quality control reports 224, 226 
with DataFax system 245-246 
responding to 183-184 
quality of life see health-related quality 
of life 
Quality of Well-Being Scale (QWB) 115 
quality-adjusted life years (QALY), 
defined 80 
questionnaires 
administering 264 
self-administered vs. interview 113 
see also health-related quality of life 
questions, for research 
answerable 158-159 
characteristics of good 154-156 
clinical 9 
defining 70, 96 
”FINER” characteristics 154-155 
foreground vs. background 68-69, 158 
formulating 145-146 
four elements 154 
generating 156 
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identifying 153-157 
”PICO” model 154 
refining 156 
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r see correlation coefficient 
R see Pearson correlation 
r? see coefficient of determination 
random error, defined 220 
randomization 167-168, 183 
automated 223 
centralized 223-224 
cluster 62 
costs 188-189 
defined 204 
form 250-251 
instructions 262 
Internet system 189, 224-225 
procedure, in manual of operations 
260 
stratified 165 
telephone system 189, 224 
unit 62-63, 262 
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randomized controlled trials (RCTs) 22, 
32-33, 59-67 
appraisal checklist 66 
budgeting for 181-195 
committees 219 
crossover trial 63-64 
defined 13, 39, 60, 70, 82, 237 
designs 63-66 
and EBM 13-14 
factorial design 64-65 
history 4 
multicenter, budget preparation 
181-193 
n-of-1 trials 65-66 
vs. nonrandomized 44-46 
vs. observational studies 45-46 
in orthopaedic surgery 59 
parallel trial 63 
planning 164-169 
protocol organization 150-152 
total expenses 193 
types 33-34 
range, statistical 271 
raters, selection, in reliability study 96 
rates, vs. proportions 270 
rating session, in reliability study 97-98 
recall bias 53, 174 
recommendations 
data monitoring committee 235-236 
grades of 42 
recruitment 183 
recruitment package 260 
reference standards, in diagnostic 
studies 87 
regional measures, HRQL 117-118 
registries of trials 
Cochrane database 11 
International Clinical Trials 
Registry 6 
regression 293 
regression analysis 180, 277, 288-293 
linear 290-291 
regulations, clinical research 5-6 
regulatory issues 197 
devices and drugs 203-209 
responsibility 201 
reimbursement, for study centers 265 
relative risk 15, 122 
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defined 92 
factors influencing 108 
measures of 93-95 
outcome measures 107-108 
types 93 
vs. validity 107 
see also agreement 
reliability coefficient 107, 108 
reliability study 91-99 
conducting 95-98 
key steps 98 
reminder form, for patients 263, 264 
reports 
forms for cases 248-257 
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reproducibility 
diagnostic tests 89 
and reliability 91 
research assistants 186, 217 
research design see study design 
research ethics board (REB) 204-205 
research evidence, implementation in 
practice 20-26 
research funding 
grant application tips 212-213 
strategies for obtaining 210-214 
research proposal, example format 150 
research question see questions 
research study see study 
research team see study team 
resources for EBM 11, 22-25 
responsibilities, in research studies 201 
responsiveness to change, HRQL 109-110 
results 
appraisal criteria 66 
discussing with patients 125 
interpreting 98, 139-141 
retrospective cohort study, example 56 
retrospective study 
case-control 50 
defined 56 
example 14 
reviews, systematic see systematic 
reviews 
risk, absolute 123 
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safety, reporting 207 
safety committee 265 
sample size 137, 140, 177-180 
calculation 149, 177, 179 
determining 149 
in reliability study 96-97 
samples 
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statistical 272-274 
sampling distributions 273-274 
scaling, of graphs 289 
screening, of patients 183 
screening form 249-250 
search interface 162 
selection 
of cases and controls 50-51 
patients, in reliability study 96 
selection bias 47, 53, 60, 172 
sensitivity 88, 89 
sensitivity analysis 41, 74, 82-83, 83 
sequential trials, history 4-5 
shadow chart 202 
Short Form-12 113 
Short Form-36 112-113 
Short Musculoskeletal Functional 
Assessment (SMFA) 118 
Sickness Impact Profile (SIP) 113-114 
significance 
clinical, and confidence intervals 
141-142 
clinical vs. statistical 129, 139-142, 
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statistical 129, 276, 292 
and treatment effect 15 
see also P value; statistical analysis/ 
testing 
single-blind studies, history 4 
single-center pilot study, budget 193 
single-event probability 121 
site investigator 217-218 
sites see clinical centers/sites; web sites 
skills, individual surgeons 13, 59, 62, 
164 
SMFA outcome measure 118 
software 
costs 187, 189 
statistical packages 217 
source documents 248 
Spearman correlation coefficient 306 
Spearman rank correlation 305 
specificity 88, 89 
sponsor 197, 205 
sponsor-investigator 205 
staffing costs 185-186 
standard case 249 
standard deviation (SD) 271, 272, 273 
standard error of the mean (SE) 274, 275 
standard operating procedures (SOPs) 
197, 234, 258 
standardized mean difference (SMD) 72 
start-up funding 211 
statistical analysis/testing 277, 279 
calculations 305-308 
choice of tests 283 
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guide 269-307 
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statistical error, types 135-136 
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statistics 
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stratification 168, 262 
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case report forms (CRFs) 248-257 
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design see study design 
leadership 232 
material 226 
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sample 269, 272-274 
sponsor 197, 205 
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team see study team 
see also clinical research; clinical trials; 
randomized controlled trials; sys- 
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study design 21 
case-control 49 
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experimental vs. observational 32 
expertise-based 62, 166 
hierarchy 21, 37-43 
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randomized vs. nonrandomized 
44-46 
sample size 177-180 
selecting 146-149 
understanding 47-99 
study team 
description in proposal 150-151 
roles 215-221 
selecting members 150 
subgroup analysis 74 
subjects, selection 96 
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surgery, evidence-based 8-11 
surgical literature, critical appraisal 
98-99 
surgical protocol, standardization 
164-165 
surgical report 252 
systematic reviews 21, 35, 69 
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defined 146 
evaluating 74-75 
example 69 
MEDLINE search strategy 163 
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with Excel 283-285 
multiple, problems with 295 
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target outcome 238 
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description in proposal 150-151 
selecting 150 
telephone costs 189 
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terminology, statistics 15 
TESS outcome measure 118 
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tests 
evaluating 276 
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power see power of studies/tests 
statistical see statistical analysis/ 
testing 
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usage 282-283 
therapeutic studies 
hierarchies of study design 39-40 
levels of evidence 38 
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timeframe, study plan 150 
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118 
travel costs 192-193 
treatment 
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threshold 85-86 
treatment effects 
comparing 123 
presentation 121-126 
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trials 
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interpreting results 124 
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two-way analysis of variance 298-300 
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uncertainty 
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univariable analysis 180 
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form 255 
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